Refactoring PHP Code

By: Roy Ganor

Introduction

Martin Fowler is my mentor!

It is not only due to his paper on the emerging usage of Domain Specific Languages, nor solely because of his useful advice on continuous integration techniques, but because of the way he describes the refactoring process for computer languages.

At first, refactoring seemed to me to be magic, over the years I have come to view it as more of a trick, and today refactoring is integrated into my development environment and used frequently and quickly. Using the refactoring functionality, in addition to other tools, I can sculpt the code to improve legibility and maintainability.

In this article I will present refactoring's strengths, and then argue that PHP developers and framework designers should immediately adopt refactoring's capabilities. Refactoring, together with other important tools, has resulted in PHP catching up with other languages that stress scalability, for Web applications and Web services enterprise applications.

Definition

The definition of refactoring is:[1]

"... a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior"

It is worth emphasizing the following parts of the definition:
  • Discipline Refactoring has a set of rules of conduct that helps it to be automatically executed.
  • Restructuring Through refactoring, code will be improved by small (yet important) transformations.
  • Behavior - Gotcha! Everything sounds great until we unintentionally change the meaning of the code.
These three issues - Discipline, Restructuring and Behavior - guide us when considering refactoring.

Forms of Refactoring

A very common thought amongst PHP developers is that they will never have the privilege of refactoring. As many articles have recently shown [2][6], there is "a standard way for programmers and tools to perform refactoring no matter what language they work in". Actually the very first refactoring tool was developed for the Smalltalk language[3] which is, in many ways, very similar to PHP. This will be elaborated on later.

But before we dig into the PHP world, let's see what refactoring offers:
  • "Rename" is probably the most useful and desired operation that the refactoring world provides us. The developer alters the name of a selected element in the code. The selected element can be a class, attribute or method.

    Assuming you don't use one-letter identifier names like Donald Knuth does, you face this task frequently since elements tend to change their purpose. Classes are usually generalized, method objectives are constantly expanded, and sometimes the developer completely changes the meaning of their variable's initial purpose.

    One may consider a "creative" way to rename an element by using the editor's "search and replace" functionality, but this would probably also result in it being the right time to explore the "Undo" functionality. The number of complications is immense. The "search and replace" does not search for uses of the element in both the current resource and external ones, or for an existing element of the same name, or for an inherited usage of the element. And there are many more. Therefore, before refactoring we firstly carry out a validation step. After refactoring we also carry out an action which may call consequent refactoring operations. This pattern shows up in each and every refactoring process.


    Figure 1: Renaming a class pre-conditions and post-conditions / "A Meta-model for Language-Independent Refactoring"

    In the next example, the user selects to rename a local variable in the printOrder method.


    Code Snippet 1: The user wants to rename the item variable in the method



    Code Snippet 2: Applying the rename refactoring on the item variable results in the changes needed

  • "Extract" is the most valuable operation to the professional developer. The idea here is to make sure we don't have duplicated code or long, tedious methods. Multiple instances of a particular section of code can be automatically identified, and replaced with a single call to that same instance. Again, the selected code can be extracted into a variable, method, or even a class (as a super class).

    Taking for example the previous code snippet (code snippet 1), one can extract a section of the code that prints the status of the order.


    Code Snippet 3: Extracting a section of code to be a separated method that can be reused later on.

  • Sometimes the developer wants to have a change but it is so complicated he just gives up. "Change signature", or the operation of adding, removing or changing one or more of a method's parameters, helps us make a really tedious change in one step. This alters the signature of a method, and of course all the affected methods.

  • "Pull up" and "Push down" operations help developers to easily apply design changes, mainly by improving their system's class structure and reusability. It allows the moving of information or functionality from a class to its super-class (Pull up) or to its sub-classes (Push down). For example, a pull up operation will easily allow the implementation of an already existing functionality to a class's parent and siblings.



    Figure 2: Simulating a Pull up refactoring operation on a class to its super-class. (a) Each circle denotes a class in our system, the arrows represent the hierarchy relations (b) The marked class includes augmented functionality that needs to be pulled up, and be shared with all siblings (c) The resulting class structure, with the shared functionality

  • High-level refactoring is the next generation of the refactoring world. It consists of several low-level refactoring operations which are sequentially integrated into one refactoring rule. This may involve several software quality issues such as higher modularity, performance improvement, lower code redundancy, and so on. It is mainly targeted at developers that usually work with well-known design patterns and want to apply them on their own code.

Refactoring in Dynamic Typed Languages such as PHP

In this section I outline how refactoring operates, then I focus on the PHP side.

As I mentioned before, any language can have a refactoring tool. The ingredients are:
  • Model representation of source code. This model helps us understand the structure of the code.
  • Set of rules for each refactoring task. The rules will be applied to perform the action. Each rule has an operative task on the structure of the model.
To apply the refactoring process we run the rules on the model and then commit accepted changes on the code.

There are two models that can be used, both of them are language independent and used by many development tools:

  1. Abstract Syntax Tree (AST), which is a pretty detailed data structure used to represent program's code. It includes the control flow of the program as well as information about each of the statement's components. This information is kept as a tree that helps us scan the structure of a program.


    Figure 3: Expression node and its subclasses. Each node represents a section of code. For example the CondEx node represents an "if" clause and it includes a predicate expression, true branch and false branch as child nodes.

  2. An extensible model for object-oriented systems. This model puts the focus on the class hierarchy, field access and method invocations. It is also quite minimalist in the way information is stored, so only relevant information about the relations of classes is kept, excluding information about the actual way it is implemented in the program.

    Figure 4: FAMIX model provides a language-independent representation of object-oriented relation information

The fact that PHP is a dynamic typed language makes code modeling a very hard task. Moreover, sometimes we cannot guarantee a 100% precise model, and the user will have to be involved in the decisions made.

There are two major methods for PHP model construction (and in general of the dynamic typed languages):
  1. Using PHP interpreter and running the code [3] we actually run the user's code to create the precise relations in the model. For example to find all occurrences of a function, we follow these steps:
    1. Rename the original function name and create a function with the old name that adds the caller (the place that calls the function) to a list. Finally, call the renamed function.
    2. Exercise the application on a test suite.
    3. Get the list of callers and change their name, at the end delete the function with the old name.

  2. Soft typing [5] - using static analysis methods to build the model without actually executing the program. This method analyzes the AST model and tries to form an extended model about the relations of the classes in the program. For example, if the user assigns a new object to a variable then the type is bound to the variable in the specific scope and the relation is resolved.
These two techniques are different in their approach and each has its advantages and disadvantages. While the first method is very precise it depends on a 100% coverage test suite so each and every possible path in the program's flow should be covered. On the other hand, the second approach doesn't need a test suite and tries to discern the context of the code from its structure and the semantics of the language. The soft typing technique is done in a conservative way, i.e. the class relations are complete but may include redundant relations.

Development Tools

Zend Development Tools group has invested efforts in making this component integrated into their next generation IDE. I took a test drive with some examples I had in mind:

Rename Variable

Let's say that the user used to hold a product entry of a "Product" table in a variable named data. He understands that it will be clearer if this variable was named productEntry, hence he invokes the refactoring tool on the variable, giving it a new name. He then gets a preview and can verify it and confirm the changes:


Figure 5: The preview pane that shows the changes in the document that should be applied in order to refactor.

Rename Class Properties

A developer decides to rename a property (class member) in his Department model. Since he is using an OO methodology and an MVC pattern, the change should affect all related files. He then alters the department name field, and gets the changes in the model, controller and viewer.


Figure 6: The changes in the EntryModel and the EntryViewer files.

Conclusions

Refactoring is the process of transforming code without changing the semantics. It is used frequently in the Agile methodologies to support system evolution and improve maintainability. A refactoring "culture" in the team can improve the communication between the members and can help apply design decisions when needed.

A refactoring tool should be integrated into standard development environments so it can be used quickly. Refactoring must also be reasonably accurate, and should highlight possible vulnerabilities.

Finally, it seems that users need refactoring tools, and this tool now seems closer than ever.

References

  1. Martin Fowler's site on refactoring http://www.refactoring.com
  2. A Meta-model for Language-Independent Refactoring, Sander Tichelaar, Ducasse Ducasse, Serge Demeyer, Oscar Nierstrasz.
  3. A Refactoring Tool for Smalltalk, Don Roberts, John Brant, and Ralph Johnson
  4. An Automated Refactoring Approach To Design Pattern-Based Program Transformations In Java Programs, Sang-Uk Jeon, Joon-Sang Lee, and Doo-Hwan Bae.
  5. Soft typing and analyses on PHP programs, Patik Camphuijsen
  6. SCG / FAMOOS / FAMIX web site http://www.iam.unibe.ch/~famoos/FAMIX/