November 18, 2004
Overview
Introduction
Recursion Defined: When A Function Calls A Function
Iteration
Recursion in Context
A Recursive Example: Factorials
Another Example: File System Traversal
Conclusion: Appropriate Use
Further Reading
About the Author
Overview
I liken PHP to the Punk Rock movement of the late 1970s. The raw power of the movement has attracted tremendous participation. And just as anyone who could strum power chords and scream along to a melody could start their own punk band, likewise anyone who can search php.net and download sample code can learn to program PHP. The lack of barriers present in other languages - like licensing fees, proprietary approaches, and privately guarded examples - makes PHP all the more attractive and available to the programmer on the street. Frankly, this thrills me, because it is this conscious commitment to openness that has made PHP one of the most popular programming languages of our time.
The accessibility of PHP also means that the PHP community is composed of programmers from a very diverse blend of backgrounds, many of them self taught. Such programmers sometimes miss out on some of the finer points of programming that can be learned through academic training. Next to object oriented programming, normalization of relational data structures and refactoring, recursion is one of the topics in computer science most often overlooked by the self-taught PHP programmer. This article is an attempt to help redress this situation.
Introduction
In a recent survey I conducted (http://forums.dreamincode.net/index.php?showtopic=9158&hl=recursion), nearly half the PHP programmers I polled didn't really know what recursion was or how to use it. In this article, we will define recursion, contrast it with iteration, and step through two examples: factorials and file system traversal. We will also look at more advanced implications, such as the use of global variables and depth versus breadth traversal. Finally, we will clarify when it is most appropriate to use recursion and when iteration will suffice.
This article is aimed at novice to intermediate PHP programmers, particularly those without a formal background in computer science, who want to understand the full power of recursion.
Recursion Defined: When A Function Calls A Function
(To be sung to the tune of "When the Moon Hits Your Eye":) "When a function calls a function in a Turing machine ... that's recu-u-u-u-ursion!"
Your first question is probably, "What's this 'Turing machine' thing doing in place of the usual lyrics, 'big pizza pie'?" Don't worry about it. The short answer is to just consider that the PHP runtime environment is a Turing-complete environment - and we all know that the PHP runtime environment is almost as near and dear to a programmer's heart as pizza.
To put it simply: recursion means anything that references itself. In computer science, we mostly talk about recursive functions - functions that call themselves. We will get into some examples momentarily.
Iteration
You are probably familiar with the two most popular iterative constructs in PHP:
for and while. Here are two examples of iterating over an
array:
<?php
for ($i = 0; $i < sizeof($array); $i++) {
print $array[$i];
}
?>
<?php
$i = 0;
while ($i < sizeof($array)) {
print $array[$i];
$i++;
}
?>
Both examples print out all the elements of a one-dimensional array. Both examples are iterative.
Simply put, iteration is doing something inside a loop. If you have never heard of recursion, you probably never realized that there is an alternative approach. This is because it is relatively easy to solve simple problems using loops. Recursion, on the other hand, seems to involve a shift in thinking for some programmers. The shift is well worth it, however, since some problems can be solved more elegantly using recursion.
Recursion in Context
Some languages are designed with recursion in mind. A classic example is LISP. Most of a LISP programmer's life is spent writing recursive functions that traverse linked-list data structures. The language itself is tuned for recursion, so that recursive functions execute very quickly and efficiently.
PHP, by contrast, chooses to focus on iteration to do most of its heavy lifting. This is probably because it draws on the heritage of C and (increasingly) Java, which do the same. None the less, recursion is not only possible in these languages - sometimes it is desirable.
A Recursive Example: Factorials
Mathematically speaking, a factorial (denoted by "!") is the result of a number multiplied by all positive integers less than that number. Non-integer and negative values are considered undefined. Zero factorial (0!), however, is defined as 1.
Here is one way to write the factorial function using iteration:6! = 6 * 5 * 4 * 3 * 2 * 1 or 720
<?php
//iterative factorial function
function factorial($number) {
$result = 1;
while ($number > 0) {
print "result = $result, number = $number\n";
$result *= $number;
$number--;
}
return $result;
}
?>
Running print factorial(6); should produce the output:
As you can see, we start out with the default values ofresult = 1, number = 6 result = 6, number = 5 result = 30, number = 4 result = 120, number = 3 result = 360, number = 2 result = 720, number = 1 720
$result and
$number in the first line. Then we multiply $result by
$number minus one until $number reaches one.
Now let's do the same thing recursively:
<?php
//recursive factorial function
function factorial($number) {
if ($number < 2) {
return 1;
} else {
return ($number * factorial($number-1));
}
}
print factorial(6);
?>
As you can see, we take a totally different approach here. Instead of setting a
$result variable, we use the $number variable exclusively. How
can we get away with this? Instead of keeping track of a counter variable (like
$i in the for/while examples above), in simple
recursion we are keeping track of the operations on a single variable.
The if statement in this example serves two purposes. The first, obvious, purpose
is to return the value 1 if you call factorial(0) or
factorial(1) from somewhere else inside your code.
But there is a second, even more important, purpose to this statement. It serves as what
is called the end condition for the recursion. Just as the end condition for
ending the while loop is that $number is no longer greater than
zero, here the end condition is when $number is less than two.
Again, the best way to see what this function is doing is to add an output line. In this
case, let's add a more descriptive line that tells us about the recursive function call.
Just after the else clause, we add:
print "factorial($number) is ($number * factorial(".($number-1)."))\n";
Now, running print factorial(6); gives us:
Where is the running total? There isn't one. Instead of multiplying out afactorial(6) is (6 * factorial(5)) factorial(5) is (5 * factorial(4)) factorial(4) is (4 * factorial(3)) factorial(3) is (3 * factorial(2)) factorial(2) is (2 * factorial(1)) 720
$return value, we are simply multiplying the number by the factorial of the
number minus one. It is the multiplication that is aggregating our code all
together. For this reason, the multiplication operator (*) is said to be our
aggregator.
In the end, the last line evaluates to:
instead of the last line of the iterative version, which evaluated to:return 6 * 5 * 4 * 3 * 2 * 1;
Because this particular type of recursion is performing the recursive function call as the second part of the return line, it is called tail recursion. If we rewrote the line this way:return 720;
return (factorial($number-1) * $number);
we would be using head recursion, and in this case, would get an identical result. When dealing with numbers, tail and head recursion often return the same thing. When dealing with data structures, however, it can be a different matter altogether.
As you can see, recursion takes a bit of a shift in thinking. But once you make that shift, you can reap two benefits:
- No need for a counter
- More natural thinking
A word of caution: note that both functions will return incorrect values if you pass in a negative or non-integer value. So, adding the line:
if ($number < 0 || !is_int($number)) {
return NULL;
}
will help make these functions safe for general use by returning a value we can equate
with the mathematical term "undefined" - in this case, NULL.
Another Example: File System Traversal
The next example is far more useful:
<?php
//a simple file tree traversal - takes in a path and returns a nested array
$delim = strstr(PHP_OS, "WIN") ? "\\" : "/";
function retrieveTree($path) {
global $delim;
if ($dir=@opendir($path)) {
while (($element=readdir($dir))!== false) {
if (is_dir($path.$delim.$element) && $element!= "." && $element!= "..") {
$array[$element] = retrieveTree($path.$delim.$element);
} elseif ($element!= "." && $element!= "..") {
$array[] = $element;
}
}
closedir($dir);
}
return (isset($array) ? $array : false);
}
?>
Assuming the following file structure:
a call to:foo foo/bar foo/baz foo/baz/bop foo/baz/bop/file4.txt foo/baz/file2.txt foo/baz/file3.txt foo/file1.txt
print_r(retrieveTree('foo'));
will yield the result:
Array
(
[bar] =>
[baz] => Array
(
[bop] => Array
(
[0] => file4.txt
)
[0] => file2.txt
[1] => file3.txt
)
[0] => file1.txt
)
How does this function achieve a complete traversal of the file system? The function
actually iterates through all the elements of a directory. Then it examines each element.
If the element is a file, it is added to the array we are constructing. If it is a
directory, the result of a call to retrieveTree($element) is added into the
array.
A word of caution about global variables: In this case, we imported a global variable to decide how to delimit a file path. That's fine. However, setting global variables from inside a recursive function is generally frowned upon. This is because setting a global variable typically means you are trying to keep track of something outside the recursive scenario you have set up. In the end, global variables referenced from inside recursive functions can end up confusing you, your code, and all who look at it.
This function is termed depth-first, because as soon as the case structure detects a directory it chases off down that directory with a recursive call. The other form of tree traversal, breadth-first, involves examining all the files in a particular subdirectory (including other directories) first, then going down to the next level by examining a subdirectory.
Since we can not control the order in which the opendir() function gives us
directory elements, if we are going to consider all the elements before moving on to the
next subdirectory, we need to separate out the directories when we encounter them so that
we can traverse them only when we have looked at everything else in the directory. One
way to do this is to push all the directories we encounter into a temporary array, and
then iterate through that array to make recursive function calls on each directory. This
example uses another iterative loop to defer the recursive calls until the end:
<?php
//a breadth-first file tree traversal
$delim = strstr(PHP_OS, "WIN") ? "\\" : "/";
function retrieveTree($path) {
global $delim;
if ($dir = @opendir($path)) {
while (($element = readdir($dir)) !== false) {
if (is_dir($path.$delim.$element) && $element != "." && $element != "..") {
//$array[$element] = NULL;
$temp_array[] = $element;
} elseif ($element != "." && $element != "..") {
$array[] = $element;
}
}
if (isset($temp_array)) {
for ($i = 0; $i < sizeof($temp_array); $i++) {
$element = $temp_array[$i];
$array[$element] = retrieveTree($path.$delim.$element);
}
}
closedir($dir);
}
return (isset($array) ? $array : false);
}
?>
Notice that we never need to reset our $temp_array variable, because each
function call happens within a separate environment space, so the
$temp_array of the calling function does not get passed down to the next
function call unless it is passed as an argument to the function.
The result of:
print_r(retrieveTree('foo'));
is now:
Array
(
[0] => file1.txt
[bar] =>
[baz] => Array
(
[0] => file2.txt
[1] => file3.txt
[bop] => Array
(
[0] => file4.txt
)
)
)
Scanning from top-to-bottom, this new printout reveals that we have dealt with all the
elements in the directory before moving on to examine subdirectories, even if dealing
with a directory just means pushing it into a temporary array. If we wanted to preserve
the original array structure we achieved with the depth-first search, but go about
creating that structure in a breadth-first way, we could uncomment the line:
$array[$element] = NULL;
just before the directory $element is pushed into the
$temp_array. This would put the key value for the directory into the array
immediately, so that it can be set later with our recursive call. The result would look
like the results of our depth-first tree, except that the approach to getting that result
would be a breadth-first traversal.
Obviously, the purpose of distinguishing between breadth-first and depth-first searches is to construct efficient searches. If you are more likely to meet your exit condition (e.g. find the file you are looking for) by looking through everything in a directory before moving on, then you should be implementing the breadth-first algorithm.
Conclusion: Appropriate Use
Computer science theory tells us that everything that can be done iteratively can be done recursively, and vice versa. However, just because it is possible does not mean it is always appropriate or even easy to implement.
As we have seen, implementing a factorial function recursively is slightly more elegant in terms of the number of variables involved and the way you get to think about the problem. Approaching the problem recursively also means thinking about the factorial problem in much the same way that a mathematician might define a factorial. It is therefore arguably a more natural approach to an algorithm than using a counting variable.
When implementing a breadth-first search, we employed both recursion and iteration;
however, the critical working component was recursion - since the recursive case statement
outside the while loop was the part of the code that determined when to stop
moving through sub-trees. The iterative component really just grabbed all the elements at
each successive level of the file system. It was recursion that allowed us to go deeper.
The more you get comfortable with recursion, the more you will see how elegant and powerful it can be. An easy way to get started is to attempt to write functions that you would normally write iteratively in a recursive fashion. Remember to trace your function calls - the easiest way is to print the recursive function call. This is what we did in the factorial exercise. Another, more simplified example of tracing function calls is as follows:
<?php
function foo($bar) {
if($bar > 0 && !($bar < 0)) { //if $bar is greater than zero and not negative
$bar--;
print("Called foo($bar)");
foo($bar);
}
}
?>
The performance difference between recursion and iteration is typically pretty negligible, and the benefits gained make wrapping your head around recursion a worthwhile exercise.
In the case of tree traversals (e.g. file systems), recursion is clearly the best approach. In fact, any time the depth or breadth of a structure is unknown, using recursion makes much more sense than using iteration. With file system traversals it is possible but much, much more difficult to implement those functions iteratively. The process involves pushing information into "stacks" (in our case, we could use PHP arrays) for later retrieval and processing. This process is tantamount to writing an assembly-level process (a stack) in a high-level language like PHP. The fact that this can be done does not necessarily mean that it is the fastest, most elegant, or even easiest way to write code.
For these reasons it is clear that recursion has a valuable niche in computer science. It provides a more natural path through unknown structures, a more intuitive way of writing many algorithms, and a more elegant approach to writing some kinds of code. Hopefully this article has helped to establish recursion as an important and useful tool in your retinue of PHP tactics. Happy coding!
Further Reading
http://en.wikipedia.org/wiki/Recursive_function
http://en.wikipedia.org/wiki/Recursion
About The Author
Robert Peake taught programming languages to computer science students at Berkeley before earning his degree in Poetry. He recently resigned as an IT Manager and took up consulting to spend more time with his family and PHP. He lives in Los Angeles with his wife, Valerie, and their two laptops.
Robert can be contacted at robert@peakepro.com.


Comments (Login to leave comments)
Antonio