Writing Libraries in PHP
Intended Audience Introduction
General Rules
Define the Goal
Document
Embrace Change
Use it Yourself
Specific Guidelines
The two types of API
Creating Methods
Naming Methods
File Trees
Plan to Expand
Beware of Infinity
Conclusion
About The Author
Intended Audience
This tutorial is intended for PHP programmers who are interested in writing reusable code. Experience with PHP4 and familiarity with writing classes are assumed. The article is primarily geared toward programmers who are new to writing code libraries, but experienced programmers may find the article informative as well.
Introduction
The core of any programming language is its API. The API (application programming interface) is the set of available functions that allow a programmer to use the programming language. Compare an API to the language's syntax, which is the rules for stringing the various commands together so the interpreter can understand them.Most of the documentation on PHP is focused on explaining the functioning of the various APIs PHP has available. Likewise, if you create a PHP library, most of the work involved is usually in sorting out how the API will work.
When you code a library, your primary goal is to write code that can be reused. You might be concerned with code reuse within a single application or maybe you want to publish the library for others to use. Taking the time to code your library properly results in code that often becomes more useful than you initially expected, and reduces the time it takes you to create the rest of your application. It also makes it easier to involve other programmers in your project. Obviously, a well-written library is much more likely to be useful to other programmers than one that was created carelessly.
Libraries generally take on two forms: functions and classes. PHP is a language that I generally classify as "hybrid" because it has most of the capabilities of an object-oriented language, yet the native PHP API is functional in nature. The shortcomings of PHP's object programming are not bad enough to warrant avoidance of writing class libraries. In fact, the examples and explanations in this article are based completely on class libraries, so I'll be covering some of the workarounds for those shortcomings.
General Rules
Define the Goal
The first thing to do when writing a class library is to determine the goal of the library (actually, this should be done before starting any kind of project, whether programming or not). I could go into a long discussion of exactly how to define the goal and exactly how much detail is required, but suffice it to say that more is better.I've seen this called “defining the ideal scene". You build a little utopia in your mind and imagine what your library would do if it were perfect. Identify key elements to this success. Is it simple? Is speed a critical concern? Must it accept many different types of input or can you establish firm guidelines for input? Security is always a concern, so will your library handle it or will it be up to the application that uses the library? Defining what the library will not do is often just as important as defining what it will do.
Be sure to document the goal. Depending on how your project is organized, you might include it in a README file. Documenting the goal helps keep you focused on what's important. Everyone has time constraints, focusing on what's important allows you to make the best use of your time. It also keeps you from turning the library into something it was never intended to be. The term "feature creep" was coined by people who either never documented a goal for their project, or ignored the goal after they started working. Documenting the goal is also important if the library is going to be made public. It keeps you from getting requests for features that don't belong, and avoids misunderstanding.
Document
Begin documenting immediately after you determine what your goal is. Actually, documenting the goal of your library bleeds right over into documenting the rest of the library.Some people think that a library should be documented after it is written. I disagree. I think the major problem with a lot of software on the market is poor documentation. This goes for open-source software as well, and can be crippling to the success of a library of any type.
Documenting prior to development seems to help in other ways as well. Getting your ideas down in documentation form can speed actual code development considerably. It also helps in the same way as having a goal defined, by keeping you (and anyone else who may be helping with development) on the right track.
Embrace Change
If you can get the API for your library correct on the first try, please let me know so I can lay a tribute at your feet. For the rest of us mortals, I believe that the easiest way to accept change is to see it as a magical experience by which the code improves. This give you a psychological edge over "Darn, I've got to change the code again".But embracing change is much more than just a positive frame of mind. It is an all-aspects bear hug of involvement. When an aspect of your library changes you need to do a number of things right away!
- Update the library itself to implement the change
- Update the documentation to reflect the change
- Plan for compatibility problems
Number three can be accomplished a number of ways. If you're aware that a lot of code already uses the old method, you should probably create a brand-new method that implements the new system while leaving the old one intact. You can often use a difference in the name to indicate the change. Unfortunately, this can lead to code bloat and the necessity of supporting old code longer than you would like. The best way to avoid this is to prominently document the fact that the old way is depreciated and clearly state the version in which it will be removed from the library. Be sure to give people a few versions to make the switch.
Compatibility can often be accomplished with wrappers around the new methods that mimic the behavior of the old methods. For example:
function count_to($start, $increment, $end)
{
$this->count_for($start, $end, $increment);
}
function count_for($start, $end, $increment = 1)
{
for ($i = $start; $i <= $end; $i += $increment) {
echo $i;
}
}
count_to has been replaced with
count_for. The new function is simpler,
since a case where the increment is one (which is fairly common) does not
require a third argument. By wrapping the new function with the old we reduce
code maintenance and bloat, while still maintaining 100% compatibility with the
old method.Change is a large and very important topic. I could probably write an entire article on how to know when to change something and when not to and (obviously) there would still be no way to know for sure that the decision was correct. I think the important thing is to consider every change with regard to the goal of the project, and decide whether the change makes the goal more likely to be accomplished or not. This is the primary gauge for knowing whether or not to make a change.
Use it Yourself
Depending on why you're creating your library, you may well be using it as you go. If you're creating the library because you need the functionality it provides for a project you're working on, don't let this fool you into thinking you're testing the library. The project you're developing for is probably only going to stress a limited number of uses of the library. When you wrote your goal, you probably defined places where the library would be useful, and you need to test all of them, even the ones that you don't need for your current project.It helps to document what has been thoroughly tested and what hasn't. It’s a good idea to write a script that tests all aspects of your library, and every time you make changes check them against this script. Failure to test wastes the time of other programmers. If someone else points out a failing add additional tests to the script.
Specific Guidelines
The two types of API
In my experience, there are two broad types of API. Keep in mind that I'm focusing completely on class APIs; this doesn't hold true for purely functional APIs.I think it's easiest to describe the two types by giving examples:
$lib = new Library;
$lib->setVar1($lib->getSomeValue());
$lib->setVar2('value2');
$lib->setVar3('value3');
$lib->doSomething();
var1,
var2, and
var3 must be set before
doSomething() can be called, and
doSomething() is actually using those
values.The other method is more like the way an API is written for a functional language. The same library written in this manner would be used thus:
$lib = new Library;
$lib->doSomething($lib->getSomeValue(), 'value2', 'value3');
Despite the potential problems, I prefer the functional approach. I think the most important thing to do is pick one of these methods and stick with it for the entire API of your library.
Some might claim that I've made the library too complicated in the “preset” example. Why not just set the class variables directly? It would make things shorter, and the class code would be less complex if I replaced the methods above with something like:
$lib->var2 = 'value2';
The problem is that this is very self-limiting. If you decide you want to validate the data placed intovar2 you can easily add such a feature
in the method. If you don't use a method to set
var2 then you can't do data validation
until doSomething() is called.
Additionally, you can never be sure how you'll want to store that value
internally, and directing an implementer to set the value directly will require
a change to the program if you alter your library. By using a method, your
change is transparent to everyone who uses your library.This is a current shortcoming of PHP's object model. It doesn't implement ties, which would allow you to set the value directly and still validate it or internally alter the representation. Knowing this, the best way to avoid problems is to use methods to set your class variables.
Creating Methods
Creating methods that are truly scalable and useful is not always easy. This is by no means an exhaustive explanation on how to achieve API perfection. I won't even pretend that I know all the answers, I'm just going to share what I've learned so far.It's best to start developing your API from the bottom up. Start out with the most basic tasks that will need accomplished and write the code for them. Many of these will be private methods (that the user of the API should never use) and here is one of the shortcomings of the PHP object model. PHP doesn't truly support private methods. The PEAR coding standards recommend that you precede methods that should be private with an underscore. This is a fair workaround, and in practice the lack of private methods seldom causes problems.
For example, in the case of a system that uses a database, the first methods that you'll want to create will be ones that allow you to access the database without knowing that it's a database. Methods such as
getRecordByName()
or getRecordById() are a good start.
Even better would be something more specific, for example: getCustomerById().Once you have these, writing the next layer up becomes easier. A method like
getAllDeliquentCustomers()
might be able to use the getCustomerByDueDate()
method. Next thing you know, you've written a generateDeliquentReport()
method with only five lines of code.You'll need to refine your API from the top down. If you think I contradicted myself, look again. You start developing from the bottom up, then you refine from the top down. Actually the process will go back and forth as long as you develop your library, with the low-level methods suggesting ideas for higher level methods, and high-level methods requiring new and revised low-level methods. Flow with this process like the programming monk you long to be. Keep aware of what your high-level functions are saying to you. "Build me a generic method to access this part of the class," they will say. "Exploit my code to create a more powerful wrapper method," the low-level methods will whisper. Listen to them. A great artist once believed that the artwork already existed within the stone, and he had but to reveal it. Programming is often the same way.
Naming Methods
You always want your methods to have names that describe what they do, without being too long to type easily. While you're at it, please bring about world peace. That goal is often difficult or impossible to accomplish, but you should always keep it in mind when naming your method, and push to come as close to perfection as you can without causing a brain hemorrhage.Be consistent with naming. Don't name one function
getAllReleventData()
and another get_all_unrelated_data()
and another fetch_data_important() within
the same library. There are a number of style guides available so you don't have
to write your own. I recommend the one developed for PEAR. But if you can't agree
with any pre-existing style guide, at least develop your own style and be consistent.If you're not sure whether to make a method private or public, make it private. If you find out that you made a mistake later, a simple FIND/REPLACE operation within your editor will correct the problem. If you make it public and realize that it wasn't a good idea, anyone who uses your library can be adversely affected when you change it.
File Trees
Exactly how this works out will differ a lot depending on the size and complexity of your library. You could, of course, put all the code for your library in a single file. There are advantages to this, such as easy distribution, but the larger the code base gets, the more disadvantages appear and. maintenance becomes a nightmare. It takes hours just to scroll through to find the method you want to work on. Also, you can't divide work among multiple programmers because the code is all in one file.Unless you're sure that the library will stay small, you should probably plan on having many files to your library right from the start. Even if you're sure that the library will remain small, you might want to do this anyway. Things have a way of outgrowing original expectations.
Look at what your code does and try to imagine a logical division of labor that will allow you to easily decide what code should go in what file. You might have alternate parts of your code that can be included at run time. For example, if you want to store files, you may have the option to store them on the file system, or in a database. Then you can split the methods between three files as follows:
class.php
database.inc.php
filesystem.inc.php
Making it work internally can be done a number of ways. My favorite is to wrap all the include code in an additional class and make it a property of the main class. Here's an example:
// File class.php
class libraryMain {
var $typeclass;
function libraryMain($type = 'filesystem')
{
switch ($type) {
case 'filesystem' :
require_once('filesystem.inc.php');
break;
case 'database' :
require_once('database.inc.php');
break;
}
$this->typeclass = new storeClass;
}
function getFile($filename)
{
return $this->typeclass->getFile($filename);
}
}
// File database.inc.php
class storeClass {
function getFile($filename)
{
// Code to retrieve a file from the database
}
}
// File filesystem.inc.php
class storeClass {
function getFile($filename)
{
// Code to retrieve a file from the filesystem
}
}
If you don't have any optional code, you can still benefit by dividing your code base into different files. Consider grouping your methods into broad categories by what they accomplish or how they function. You might put all your low-level methods in one file, and your high-level methods in another. You might put all the methods that deal with files in a file called
file.inc.php
while all the methods that deal with database records are in db.inc.php
and all the methods that manipulate the data are in manip.inc.php.Plan to Expand
One of the beautiful things about high-level languages like PHP is the ability to move data around within the code. There are no pointers to frustrate over, or memory management to worry about. Take advantage of this feature-filled language and pass the power on to users of your library.Never return a single value when you can return an array. Never return an array when you can return an object. Seriously. The beautiful thing about arrays and objects is their ability to expand. What if your function generates an error? One way to handle it is to return false and have a method that retrieves the last error. This is implemented something like this:
$o = new Library;
if (!$o->someMethodCall()) {
echo $o->getLastError();
}
$o = new Library;
$r = $o->someMethodCall();
if ($r->gotError()) {
$r->tellErrorToUser();
$r->resetClassToHandleError();
} else {
$r->doWhatYouWouldDoIfSuccess();
}
$o = new Library;
$r = $o->someMethodCall();
if ($r['error']) {
echo $r['usererrormsg'];
if ($user == 'Admin') {
echo $r['adminerrormsg'];
}
} else {
echo $r['value'];
}
This works in the other direction as well. Think about the arguments to your method and consider the possibility that an array might be better than a long list of values, especially if you have a lot of optional values. For example:
function setUpShop($name, $size = 'large', $color = 'red', $angle = 0, $scale = 1, $language = 'en')
Obviously, the only required value is the name, while the remaining values will be assigned defaults if not specified. But what if the person using this function needs to specify a non-default angle, but leave the default size and color. There's no way to do it. You could alternately define the function as:
function setUpShop($name, $parameters = array())
{
if (!isset($parameters['size'])) $parameters['size'] = 'large';
if (!isset($parameters['color'])) $parameters['color'] = 'red';
... etc ...
There are disadvantages and advantages to both approaches. In the second one, the programmer has to put the values into the array prior to calling the function, which can be somewhat tedious. The best time to use the second approach is when there are a lot of potential values, and you can't be sure which ones the user will want to set, and which ones will be left as defaults. The first method works best if you know that the presence of value three will always require value two, and so forth. Another advantage to the second method is that you can add parameters to the method call without changing the API, so if you're not sure what parameters you might need in the long run, the second method is probably better.
Beware of Infinity
The only major error I know of in the PHP documentation is where the database documents say that you don't have free result sets after use, since PHP will free the memory automatically when the script terminates. This is bad advice all around, but especially bad for libraries.Always free large amounts of memory when you're through using it. Database result sets are a good example. (Use the preferred function, such as
pg_free_result().)
Variables that end up containing large arrays or long strings should be unset()
when you know you won't need them again.This seems to contradict what I said earlier about not having to worry about pointers and memory management, but it's important. No matter how big and powerful newer computers get, they're still working with limited resources. And you never know how many times a particular script might legitimately call your class methods. You might be surprised how quickly repeated database result sets can eat up all the memory on a web server. Large arrays and long string values aren't as bad because the memory is garbage collected when they go out of scope, but keep your eyes open for problems. Database connections can cause trouble as well, since most database servers have a limit to the number of connections that can be made. If possible, reuse the same connection. If not, be sure to close the connection when you're done with it.
Conclusion
I haven't covered everything there is to say about writing libraries. Then again, I don't feel that could be done without writing an entire book.I feel that the most important points are consistency and documentation. Without consistency, a library is difficult to use, and lacks the “polish” that many people expect. Without documentation, well, there's little chance that anyone will be able to use the library.

Comments