Intended Audience
Introduction
General Rules
Define the Goal
Document
Embrace Change
Use it Yourself
Specific Guidelines
The two types of API
Creating Methods
Naming Methods
File Trees
Plan to Expand
Beware of Infinity
Conclusion
Intended Audience
This tutorial is intended for PHP programmers who are
interested in writing reusable code. Experience with PHP4 and familiarity with
writing classes are assumed. The article is primarily geared toward programmers
who are new to writing code libraries, but experienced programmers may find the
article informative as well.
Introduction
The core of any programming language is its API. The API (application programming
interface) is the set of available functions that allow a programmer to use the
programming language. Compare an API to the language’s syntax, which is the rules
for stringing the various commands together so the interpreter can understand
them.
Most of the documentation on PHP is focused on explaining the functioning of the
various APIs PHP has available. Likewise, if you create a PHP library, most of
the work involved is usually in sorting out how the API will work.
When you code a library, your primary goal is to write code that can be reused.
You might be concerned with code reuse within a single application or maybe you
want to publish the library for others to use. Taking the time to code your library
properly results in code that often becomes more useful than you initially expected,
and reduces the time it takes you to create the rest of your application. It also
makes it easier to involve other programmers in your project. Obviously, a well-written
library is much more likely to be useful to other programmers than one that was
created carelessly.
Libraries generally take on two forms: functions and classes. PHP is a language
that I generally classify as “hybrid” because it has most of the capabilities
of an object-oriented language, yet the native PHP API is functional in nature.
The shortcomings of PHP’s object programming are not bad enough to warrant avoidance
of writing class libraries. In fact, the examples and explanations in this article
are based completely on class libraries, so I’ll be covering some of the workarounds
for those shortcomings.
General Rules
Define the Goal
The first thing to do when writing a class library is to determine the goal of
the library (actually, this should be done before starting any kind of project,
whether programming or not). I could go into a long discussion of exactly how
to define the goal and exactly how much detail is required, but suffice it to
say that more is better.
I’ve seen this called “defining the ideal scene”. You build a little utopia
in your mind and imagine what your library would do if it were perfect. Identify
key elements to this success. Is it simple? Is speed a critical concern? Must
it accept many different types of input or can you establish firm guidelines for
input? Security is always a concern, so will your library handle it or will it
be up to the application that uses the library? Defining what the library will
not do is often just as important as defining what it will do.
Be sure to document the goal. Depending on how your project is organized, you
might include it in a README file. Documenting the goal helps keep you focused
on what’s important. Everyone has time constraints, focusing on what’s important
allows you to make the best use of your time. It also keeps you from turning the
library into something it was never intended to be. The term “feature creep” was
coined by people who either never documented a goal for their project, or ignored
the goal after they started working. Documenting the goal is also important if
the library is going to be made public. It keeps you from getting requests for
features that don’t belong, and avoids misunderstanding.
Document
Begin documenting immediately after you determine what your goal is. Actually,
documenting the goal of your library bleeds right over into documenting the rest
of the library.
Some people think that a library should be documented after it is written.
I disagree. I think the major problem with a lot of software on the market is
poor documentation. This goes for open-source software as well, and can be crippling
to the success of a library of any type.
Documenting prior to development seems to help in other ways as well. Getting
your ideas down in documentation form can speed actual code development considerably.
It also helps in the same way as having a goal defined, by keeping you (and anyone
else who may be helping with development) on the right track.
Embrace Change
If you can get the API for your library correct on the first try, please let me
know so I can lay a tribute at your feet. For the rest of us mortals, I believe
that the easiest way to accept change is to see it as a magical experience by
which the code improves. This give you a psychological edge over “Darn, I’ve got
to change the code again”.
But embracing change is much more than just a positive frame of mind. It is an
all-aspects bear hug of involvement. When an aspect of your library changes you
need to do a number of things right away!
- Update the library itself to implement the change
- Update the
documentation to reflect the change - Plan for compatibility problems
good to describe how the function used to work, mention the exact version where
the change took place, and detail the nature of the change and the new
behavior.
Number three can be accomplished a number of ways. If you’re aware that a lot
of code already uses the old method, you should probably create a brand-new method
that implements the new system while leaving the old one intact. You can often
use a difference in the name to indicate the change. Unfortunately, this can lead
to code bloat and the necessity of supporting old code longer than you would like.
The best way to avoid this is to prominently document the fact that the old way
is depreciated and clearly state the version in which it will be removed from
the library. Be sure to give people a few versions to make the switch.
Compatibility can often be accomplished with wrappers around the new methods that
mimic the behavior of the old methods. For example:
function count_to($start, $increment, $end)
{
$this->count_for($start, $end, $increment);
}
function count_for($start, $end, $increment = 1)
{
for ($i = $start; $i <= $end; $i += $increment) {
echo $i;
}
}
As you can see in this fictional example,
count_to has been replaced with
count_for. The new function is simpler,
since a case where the increment is one (which is fairly common) does not
require a third argument. By wrapping the new function with the old we reduce
code maintenance and bloat, while still maintaining 100% compatibility with the
old method.
Change is a large and very important topic. I could probably write an entire article
on how to know when to change something and when not to and (obviously) there
would still be no way to know for sure that the decision was correct. I think
the important thing is to consider every change with regard to the goal of the
project, and decide whether the change makes the goal more likely to be accomplished
or not. This is the primary gauge for knowing whether or not to make a change.
Use it Yourself
Depending on why you're creating your library, you may well be using it as you
go. If you're creating the library because you need the functionality it provides
for a project you're working on, don't let this fool you into thinking you're
testing the library. The project you're developing for is probably only going
to stress a limited number of uses of the library. When you wrote your goal, you
probably defined places where the library would be useful, and you need to test
all of them, even the ones that you don't need for your current project.
It helps to document what has been thoroughly tested and what hasn't. It’s
a good idea to write a script that tests all aspects of your library, and every
time you make changes check them against this script. Failure to test wastes the
time of other programmers. If someone else points out a failing add additional
tests to the script.
Specific Guidelines
The two types of API
In my experience, there are two broad types of API. Keep in mind that I'm focusing
completely on class APIs; this doesn't hold true for purely functional APIs.
I think it's easiest to describe the two types by giving examples:
$lib = new Library;
$lib->setVar1($lib->getSomeValue());
$lib->setVar2('value2');
$lib->setVar3('value3');
$lib->doSomething();
I call this the
“preset” approach. (There may be some other, nicer sounding,
computer science degree savvy name for it, but I don't know what it is).
Microsoft seems to use this model in many of their APIs. It's characterized by a
method of use that involves setting a number of class values and then calling a
method to act on those values. In the example above
var1,
var2, and
var3 must be set before
doSomething() can be called, and
doSomething() is actually using those
values.
The other method is more like the way an API is written for a functional language.
The same library written in this manner would be used thus:
$lib = new Library;
$lib->doSomething($lib->getSomeValue(), 'value2', 'value3');
There
are advantages and disadvantages to both methods. For methods that have few
operands, the functional style is usually simpler and shorter. Reading the code
is easier too, since all the pertinent information is in one place. In the
“preset” method, the setting of the class variables might occur much
earlier in the code, and it might not be immediately obvious what those values
were set to. On the other hand, the ”preset” method stays consistent
even when the values are complicated to acquire. For example, if the first value
required a number of method calls and a lot of math operations to calculate, it
would make sense to store it in another variable temporarily. Whereas the preset
method would not change very much.
Despite the potential problems, I prefer the functional approach. I think the
most important thing to do is pick one of these methods and stick with it for
the entire API of your library.
Some might claim that I've made the library too complicated in the “preset”
example. Why not just set the class variables directly? It would make things shorter,
and the class code would be less complex if I replaced the methods above with
something like:
$lib->var2 = 'value2';
The problem is that this
is very self-limiting. If you decide you want to validate the data placed into
var2 you can easily add such a feature
in the method. If you don't use a method to set
var2 then you can't do data validation
until doSomething() is called.
Additionally, you can never be sure how you'll want to store that value
internally, and directing an implementer to set the value directly will require
a change to the program if you alter your library. By using a method, your
change is transparent to everyone who uses your library.
This is a current shortcoming of PHP's object model. It doesn't implement ties,
which would allow you to set the value directly and still validate it or internally
alter the representation. Knowing this, the best way to avoid problems is to use
methods to set your class variables.
Creating Methods
Creating methods that are truly scalable and useful is not always easy. This is
by no means an exhaustive explanation on how to achieve API perfection. I won't
even pretend that I know all the answers, I'm just going to share what I've learned
so far.
It's best to start developing your API from the bottom up. Start out with the
most basic tasks that will need accomplished and write the code for them. Many
of these will be private methods (that the user of the API should never use) and
here is one of the shortcomings of the PHP object model. PHP doesn't truly support
private methods. The PEAR coding standards recommend that you precede methods
that should be private with an underscore. This is a fair workaround, and in practice
the lack of private methods seldom causes problems.
For example, in the case of a system that uses a database, the first methods that
you'll want to create will be ones that allow you to access the database without
knowing that it's a database. Methods such as getRecordByName()
or getRecordById() are a good start.
Even better would be something more specific, for example: getCustomerById().
Once you have these, writing the next layer up becomes easier. A method like getAllDeliquentCustomers()
might be able to use the getCustomerByDueDate()
method. Next thing you know, you've written a generateDeliquentReport()
method with only five lines of code.
You'll need to refine your API from the top down. If you think I contradicted
myself, look again. You start developing from the bottom up, then you refine from
the top down. Actually the process will go back and forth as long as you develop
your library, with the low-level methods suggesting ideas for higher level methods,
and high-level methods requiring new and revised low-level methods. Flow with
this process like the programming monk you long to be. Keep aware of what your
high-level functions are saying to you. "Build me a generic method to access this
part of the class," they will say. "Exploit my code to create a more powerful
wrapper method," the low-level methods will whisper. Listen to them. A great artist
once believed that the artwork already existed within the stone, and he had but
to reveal it. Programming is often the same way.
Naming Methods
You always want your methods to have names that describe what they do, without
being too long to type easily. While you're at it, please bring about world peace.
That goal is often difficult or impossible to accomplish, but you should always
keep it in mind when naming your method, and push to come as close to perfection
as you can without causing a brain hemorrhage.
Be consistent with naming. Don't name one function getAllReleventData()
and another get_all_unrelated_data()
and another fetch_data_important() within
the same library. There are a number of style guides available so you don't have
to write your own. I recommend the one developed for PEAR. But if you can't agree
with any pre-existing style guide, at least develop your own style and be consistent.
If you're not sure whether to make a method private or public, make it private.
If you find out that you made a mistake later, a simple FIND/REPLACE operation
within your editor will correct the problem. If you make it public and realize
that it wasn't a good idea, anyone who uses your library can be adversely affected
when you change it.
File Trees
Exactly how this works out will differ a lot depending on the size and complexity
of your library. You could, of course, put all the code for your library in a
single file. There are advantages to this, such as easy distribution, but the
larger the code base gets, the more disadvantages appear and. maintenance becomes
a nightmare. It takes hours just to scroll through to find the method you want
to work on. Also, you can't divide work among multiple programmers because the
code is all in one file.
Unless you're sure that the library will stay small, you should probably plan
on having many files to your library right from the start. Even if you're sure
that the library will remain small, you might want to do this anyway. Things have
a way of outgrowing original expectations.
Look at what your code does and try to imagine a logical division of labor that
will allow you to easily decide what code should go in what file. You might have
alternate parts of your code that can be included at run time. For example, if
you want to store files, you may have the option to store them on the file system,
or in a database. Then you can split the methods between three files as follows:
class.php
database.inc.php
filesystem.inc.php
This helps in many ways. You don't
have nearly as many if-then blocks in the code. PHP has to parse less. If you
want to rework the file system code, you don't risk breaking the database code
while you're doing it. You get the idea.
Making it work internally can be done a number of ways. My favorite is to wrap
all the include code in an additional class and make it a property of the main
class. Here's an example:
// File class.php
class libraryMain {
var $typeclass;
function libraryMain($type = 'filesystem')
{
switch ($type) {
case 'filesystem' :
require_once('filesystem.inc.php');
break;
case 'database' :
require_once('database.inc.php');
break;
}
$this->typeclass = new storeClass;
}
function getFile($filename)
{
return $this->typeclass->getFile($filename);
}
}
// File database.inc.php
class storeClass {
function getFile($filename)
{
// Code to retrieve a file from the database
}
}
// File filesystem.inc.php
class storeClass {
function getFile($filename)
{
// Code to retrieve a file from the filesystem
}
}
Aside from the fact that this oversimplified code is
lacking basic error checking, it will allow you to easily use both file
system-based and database-based file storage. There are other ways to accomplish
this, however I'll leave them as an exercise to the reader.
If you don't have any optional code, you can still benefit by dividing your code
base into different files. Consider grouping your methods into broad categories
by what they accomplish or how they function. You might put all your low-level
methods in one file, and your high-level methods in another. You might put all
the methods that deal with files in a file called file.inc.php
while all the methods that deal with database records are in db.inc.php
and all the methods that manipulate the data are in manip.inc.php.
Plan to Expand
One of the beautiful things about high-level languages like PHP is the ability
to move data around within the code. There are no pointers to frustrate over,
or memory management to worry about. Take advantage of this feature-filled language
and pass the power on to users of your library.
Never return a single value when you can return an array. Never return an array
when you can return an object. Seriously. The beautiful thing about arrays and
objects is their ability to expand. What if your function generates an error?
One way to handle it is to return false and have a method that retrieves the last
error. This is implemented something like this:
$o = new Library;
if (!$o->someMethodCall()) {
echo $o->getLastError();
}
This works, and in many cases is the most practical
approach. But don't fail to consider the following enhancement:
$o = new Library;
$r = $o->someMethodCall();
if ($r->gotError()) {
$r->tellErrorToUser();
$r->resetClassToHandleError();
} else {
$r->doWhatYouWouldDoIfSuccess();
}
To say that all methods should return an object
is insane, but always consider the possibility. Arrays work almost as nicely.
Consider the following example:
$o = new Library;
$r = $o->someMethodCall();
if ($r['error']) {
echo $r['usererrormsg'];
if ($user == 'Admin') {
echo $r['adminerrormsg'];
}
} else {
echo $r['value'];
}
Each of the three examples does something a little
different to illustrate some of the advantages of each method. Don't blindly use
objects or arrays for all return values. The best way to decide is to consider
what the implementer will be doing with the returned information and use that as
a gauge. If they'll be using a lot of methods on it every time they acquire it,
an object may be the best thing to return. If there are a number of values that
may need returned, but you can't be sure how they'll be used, an array is
probably best. If the method simply does something and needs to indicate success
or failure, then a single value will suffice.
This works in the other direction as well. Think about the arguments to your method
and consider the possibility that an array might be better than a long list of
values, especially if you have a lot of optional values. For example:
function setUpShop($name, $size = 'large', $color = 'red', $angle = 0, $scale = 1, $language = 'en')
Obviously,
the only required value is the name, while the remaining values will be assigned
defaults if not specified. But what if the person using this function needs to
specify a non-default angle, but leave the default size and color. There's no
way to do it. You could alternately define the function as:
function setUpShop($name, $parameters = array())
{
if (!isset($parameters['size'])) $parameters['size'] = 'large';
if (!isset($parameters['color'])) $parameters['color'] = 'red';
... etc ...
This way, the user can specify only the
parameters needed, and the rest will be set to defaults. It's also possible to
provide an object as a parameter, although the advantages are fewer.
There are disadvantages and advantages to both approaches. In the second one,
the programmer has to put the values into the array prior to calling the function,
which can be somewhat tedious. The best time to use the second approach is when
there are a lot of potential values, and you can't be sure which ones the user
will want to set, and which ones will be left as defaults. The first method works
best if you know that the presence of value three will always require value two,
and so forth. Another advantage to the second method is that you can add parameters
to the method call without changing the API, so if you're not sure what parameters
you might need in the long run, the second method is probably better.
Beware of Infinity
The only major error I know of in the PHP documentation is where the database
documents say that you don't have free result sets after use, since PHP will free
the memory automatically when the script terminates. This is bad advice all around,
but especially bad for libraries.
Always free large amounts of memory when you're through using it. Database
result sets are a good example. (Use the preferred function, such as pg_free_result().)
Variables that end up containing large arrays or long strings should be unset()
when you know you won't need them again.
This seems to contradict what I said earlier about not having to worry about pointers
and memory management, but it's important. No matter how big and powerful newer
computers get, they're still working with limited resources. And you never know
how many times a particular script might legitimately call your class methods.
You might be surprised how quickly repeated database result sets can eat up all
the memory on a web server. Large arrays and long string values aren't as bad
because the memory is garbage collected when they go out of scope, but keep your
eyes open for problems. Database connections can cause trouble as well, since
most database servers have a limit to the number of connections that can be made.
If possible, reuse the same connection. If not, be sure to close the connection
when you're done with it.
Conclusion
I haven't covered everything there is to say about writing libraries. Then again,
I don't feel that could be done without writing an entire book.
I feel that the most important points are consistency and documentation. Without
consistency, a library is difficult to use, and lacks the “polish”
that many people expect. Without documentation, well, there's little chance that
anyone will be able to use the library.
About The Author
Bill Moran works for Potential Technologies
and has been helping people get more out of their computers for over ten years.
He can be reached at wmoran@potentialtech.com.


One comment to “Writing Libraries in PHP”
January 8th, 2009 at 8:47 am
I found your article to be very concise and enlightening. Thank you for taking the time to share your knowledge with the rest of us. Could you recommend any good books on developing good API’s? Regards, Mike.