Refactoring PHP Code

September 17, 2007

Uncategorized

Refactoring PHP Code

By: Roy Ganor

Introduction

Martin Fowler is my mentor!

It is not only due to his paper on the emerging usage of Domain Specific Languages, nor solely because of his useful advice on continuous integration techniques, but because of the way he describes the refactoring process for computer languages.


At first, refactoring seemed to me to be magic, over the years I have come to view it as more of a trick, and today refactoring is integrated into my development environment and used frequently and quickly. Using the refactoring functionality, in addition to other tools, I can sculpt the code to improve legibility and maintainability.


In this article I will present refactoring’s strengths, and then argue that PHP developers and framework designers should immediately adopt refactoring’s capabilities. Refactoring, together with other important tools, has resulted in PHP catching up with other languages that stress scalability, for Web applications and Web services enterprise applications.

Definition

The definition of refactoring is:[1]




"… a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior"


It is worth emphasizing the following parts of the definition:

  • Discipline Refactoring has a set of rules of conduct that helps it to be automatically executed.
  • Restructuring Through refactoring, code will be improved by small (yet important) transformations.
  • Behavior – Gotcha! Everything sounds great until we unintentionally change the meaning of the code.

These three issues – Discipline, Restructuring and Behavior – guide us when considering refactoring.

Forms of Refactoring

A very common thought amongst PHP developers is that they will never have the privilege of refactoring. As many articles have recently shown [2][6], there is "a standard way for programmers and tools to perform refactoring no matter what language they work in". Actually the very first refactoring tool was developed for the Smalltalk language[3] which is, in many ways, very similar to PHP. This will be elaborated on later.

But before we dig into the PHP world, let’s see what refactoring offers:

  • "Rename" is probably the most useful and desired operation that the refactoring world provides us. The developer alters the name of a selected element in the code. The selected element can be a class, attribute or method.


    Assuming you don’t use one-letter identifier names like Donald Knuth does, you face this task frequently since elements tend to change their purpose. Classes are usually generalized, method objectives are constantly expanded, and sometimes the developer completely changes the meaning of their variable’s initial purpose.


    One may consider a “creative” way to rename an element by using the editor’s "search and replace" functionality, but this would probably also result in it being the right time to explore the "Undo" functionality. The number of complications is immense. The "search and replace" does not search for uses of the element in both the current resource and external ones, or for an existing element of the same name, or for an inherited usage of the element. And there are many more.
    Therefore, before refactoring we firstly carry out a validation step. After refactoring we also carry out an action which may call consequent refactoring operations. This pattern shows up in each and every refactoring process.


    Figure 1: Renaming a class pre-conditions and post-conditions / "A Meta-model for Language-Independent Refactoring"



    In the next example, the user selects to rename a local variable in the printOrder method.




    Code Snippet 1: The user wants to rename the item variable in the method





    Code Snippet 2: Applying the rename refactoring on the item variable results in the changes needed

  • "Extract" is the most valuable operation to the professional developer. The idea here is to make sure we don’t have duplicated code or long, tedious methods. Multiple instances of a particular section of code can be automatically identified, and replaced with a single call to that same instance. Again, the selected code can be extracted into a variable, method, or even a class (as a super class).


    Taking for example the previous code snippet (code snippet 1), one can extract a section of the code that prints the status of the order.




    Code Snippet 3: Extracting a section of code to be a separated method that can be reused later on.

  • Sometimes the developer wants to have a change but it is so complicated he just gives up. "Change signature", or the operation of adding, removing or changing one or more of a method’s parameters, helps us make a really tedious change in one step. This alters the signature of a method, and of course all the affected methods.

  • "Pull up" and "Push down" operations help developers to easily apply design changes, mainly by improving their system’s class structure and reusability. It allows the moving of information or functionality from a class to its super-class (Pull up) or to its sub-classes (Push down). For example, a pull up operation will easily allow the implementation of an already existing functionality to a class’s parent and siblings.





    Figure 2: Simulating a Pull up refactoring operation on a class to its super-class. (a) Each circle denotes a class in our system, the arrows represent the hierarchy relations (b) The marked class includes augmented functionality that needs to be pulled up, and be shared with all siblings (c) The resulting class structure, with the shared functionality

  • High-level refactoring is the next generation of the refactoring world. It consists of several low-level refactoring operations which are sequentially integrated into one refactoring rule. This may involve several software quality issues such as higher modularity, performance improvement, lower code redundancy, and so on. It is mainly targeted at developers that usually work with well-known design patterns and want to apply them on their own code.

Refactoring in Dynamic Typed Languages such as PHP

In this section I outline how refactoring operates, then I focus on the PHP side.


As I mentioned before, any language can have a refactoring tool. The ingredients are:

  • Model representation of source code. This model helps us understand the structure of the code.
  • Set of rules for each refactoring task. The rules will be applied to perform the action. Each rule has an operative task on the structure of the model.

To apply the refactoring process we run the rules on the model and then commit accepted changes on the code.


There are two models that can be used, both of them are language independent and used by many development tools:

  1. Abstract Syntax Tree (AST), which is a pretty detailed data structure used to represent program’s code. It includes the control flow of the program as well as information about each of the statement’s components. This information is kept as a tree that helps us scan the structure of a program.




    Figure 3: Expression node and its subclasses. Each node represents a section of code. For example the CondEx node represents an “if” clause and it includes a predicate expression, true branch and false branch as child nodes.

  2. An extensible model for object-oriented systems. This model puts the focus on the class hierarchy, field access and method invocations. It is also quite minimalist in the way information is stored, so only relevant information about the relations of classes is kept, excluding information about the actual way it is implemented in the program.



    Figure 4: FAMIX model provides a language-independent representation of object-oriented relation information



The fact that PHP is a dynamic typed language makes code modeling a very hard task. Moreover, sometimes we cannot guarantee a 100% precise model, and the user will have to be involved in the decisions made.


There are two major methods for PHP model construction (and in general of the dynamic typed languages):

  1. Using PHP interpreter and running the code [3] we actually run the user’s code to create the precise relations in the model. For example to find all occurrences of a function, we follow these steps:
    1. Rename the original function name and create a function with the old name that adds the caller (the place that calls the function) to a list. Finally, call the renamed function.
    2. Exercise the application on a test suite.
    3. Get the list of callers and change their name, at the end delete the function with the old name.


  2. Soft typing [5] – using static analysis methods to build the model without actually executing the program. This method analyzes the AST model and tries to form an extended model about the relations of the classes in the program. For example, if the user assigns a new object to a variable then the type is bound to the variable in the specific scope and the relation is resolved.

These two techniques are different in their approach and each has its advantages and disadvantages. While the first method is very precise it depends on a 100% coverage test suite so each and every possible path in the program’s flow should be covered. On the other hand, the second approach doesn’t need a test suite and tries to discern the context of the code from its structure and the semantics of the language. The soft typing technique is done in a conservative way, i.e. the class relations are complete but may include redundant relations.

Development Tools

Zend Development Tools group has invested efforts in making this component integrated into their next generation IDE. I took a test drive with some examples I had in mind:

Rename Variable

Let’s say that the user used to hold a product entry of a "Product" table in a variable named data. He understands that it will be clearer if this variable was named productEntry, hence he invokes the refactoring tool on the variable, giving it a new name. He then gets a preview and can verify it and confirm the changes:




Figure 5: The preview pane that shows the changes in the document that should be applied in order to refactor.

Rename Class Properties

A developer decides to rename a property (class member) in his Department model. Since he is using an OO methodology and an MVC pattern, the change should affect all related files. He then alters the department name field, and gets the changes in the model, controller and viewer.



Figure 6: The changes in the EntryModel and the EntryViewer files.

Conclusions

Refactoring is the process of transforming code without changing the semantics. It is used frequently in the Agile methodologies to support system evolution and improve maintainability. A refactoring "culture" in the team can improve the communication between the members and can help apply design decisions when needed.


A refactoring tool should be integrated into standard development environments so it can be used quickly. Refactoring must also be reasonably accurate, and should highlight possible vulnerabilities.


Finally, it seems that users need refactoring tools, and this tool now seems closer than ever.

References

  1. Martin Fowler’s site on refactoring http://www.refactoring.com
  2. A Meta-model for Language-Independent Refactoring, Sander Tichelaar, Ducasse Ducasse, Serge Demeyer, Oscar Nierstrasz.
  3. A Refactoring Tool for Smalltalk, Don Roberts, John Brant, and Ralph Johnson
  4. An Automated Refactoring Approach To Design Pattern-Based Program Transformations In Java Programs, Sang-Uk Jeon, Joon-Sang Lee, and Doo-Hwan Bae.
  5. Soft typing and analyses on PHP programs, Patik Camphuijsen
  6. SCG / FAMOOS / FAMIX web site http://www.iam.unibe.ch/~famoos/FAMIX/

About Cal Evans

Many moons ago, at the tender age of 14, Cal touched his first computer. (We're using the term "computer" loosely here, it was a TRS-80 Model 1) Since then his life has never been the same. He graduated from TRS-80s to Commodores and eventually to IBM PC's. For the past 10 years Cal has worked with PHP and MySQL on Linux OSX, and when necessary, Windows. He has built on a variety of projects ranging in size from simple web pages to multi-million dollar web applications. When not banging his head on his monitor, attempting a blood sacrifice to get a particular piece of code working, he enjoys building and managing development teams using his widely imitated but never patented management style of "management by wandering around". Cal is currently based in Nashville, TN and is gainfully unemployed as the Chief Marketing Officer of Blue Parabola, LLC. Cal is happily married to wife 1.28, the lovely and talented Kathy. Together they have 2 kids who were both bright enough not to pursue a career in IT. Cal blogs at http://blog.calevans.com and is the founder and host of Day Camp 4 Developers

View all posts by Cal Evans

9 Responses to “Refactoring PHP Code”

  1. _____anonymous_____ Says:

    hi cal,

    Snippet 3 seems to have an error. You pass in the $order var, but populate the $id var from a call to a method on the $item object var used in the previous code snippets. Where is the $item coming from to make a call against?

    If you have tests established prior to, or as you go though, your refactoring, that should catch the bug introduced as the accompanying test will fail when you check that the refactor has not changed the external behaviour.

    Refactoring is good, but you get much more confidence if there is the relative comfort of having tests to confirm the changes will not break your app.

    nice article btw, please sir, can we have more?

    cheers,
    paul

  2. exceptione Says:

    Nice article! Unfortunately the link in reference 6 is broken.

  3. _____anonymous_____ Says:

    .. as is the one just above the references. Stil, excellent read :)

  4. ajessica Says:

    exceptione,
    SCG / FAMOOS / FAMIX web site working fine for me

  5. lordspace Says:

    Nice article!

    I have noticed 2 things.

    1) You’ve missed the global keyword before $client
    2) I don’t think you should use the "__call" method directly, because underscores imply private/protected visibility.
    You should be able to call $client->getOrderStatus(array(‘id’ => $id)); and this will call internally the "__call" method.

    Slavi
    http://seofilter.org/
    http://devcha.blogspot.com/

  6. _____anonymous_____ Says:

    Hi,

    I am very suprize to see a guy who know smalltalk here.
    Nice post.

    That’s what disapoint me the most in the php IDE: no refactoring, no type infering :( ..

  7. eriklars21 Says:

    Very nice article , I’ve learned a lot.It’s well researched and has great schematics.

    <a href="http://allaboutbreakups.com/"&gt; get your ex back</a>

    <a href="http://allaboutbreakups.com/"&gt; how to get back with your ex</a>

  8. rud5g Says:

    update link reference 6:
    users need refactoring tools
    http://rc3.org/2006/02/18/php-is-bad/

  9. xxshirley Says:

    Very good article! It goes deep and has some nice pictures.