PHP Built in Input filtering

      21 Comments on PHP Built in Input filtering

Security becomes the top priority (or activity) of many PHP developers. Its place and importance keeps growing in every single project, open source or commercial. Every conference provides a talk about security and you can read PHP security on the magazine cover pages.

Security in PHP application is a large topic. This article explains one of the most important part of any security policy: the input or data filtering.

General security topics like XSS, SQL injections and other dangerous attacks will not be discussed here, take a look at the end of this article for a small list of resources.

Don’t trust external data

Practically all applications (web, desktop, console) depend on external input to create output or to start an action. This input comes from a user or another application (web services clients, bots, scanner, etc.). The rule #1 of every developer (you all know it, but it does not hurt to write it down once more) should be:

Filter All Foreign Data

Input filtering is one of the cornerstones of any application security, independently of the language or environment. PHP provides a wide range of tools and functions to filter or validate data, but unlike other languages, it does not have any standard functions to filter data (like cgi for perl). The new Filter extension fills this gap.

What’s foreign data?

  • Anything from a form
  • Anything from $_GET, $_POST, $_REQUEST
  • Cookies ($_COOKIES)
  • Web services data
  • Files
  • Some server variables (e.g. $_SERVER[‘SERVER_NAME’])
  • Environment variables
  • Database query results

Filter supports get, post, cookies, server and environment variables as well as defined variables (server and env support may not work in all sapi, for filter 0.11.0 or php 5.2.0).

Why Filter?

To test, validate and filter user input or custom data can rapidly be annoying and repetitive task. It is easy to forget a test or write an incomplete regular expression. The Filter extension aims to make data filtering less painful as this simple example shows:

Check two integer _GET input values:

and using Filter’s filter_input:

How does it work?

The process to transform an input request to a set of user land variables is done by the SAPI layer. Without going into the details, the SAPI layer is the interface between the wild world and the PHP engine. The engine fetches external data (ENV, SERVER, COOKIE, GET or POST)
from the SAPI and transforms them into the well known super global or uses them in the related functions like getenv.

Filter is active both at the SAPI level and in the engine. SAPI supports custom filtering operation, Filter functions are called for each external data being processed by the Engine. It is done before the script gets the hand on it. How the SAPI filter will process the data is defined by the default filter, configurable using your php.ini.

Given a simple POST request like:

POST /myform.php?myfield=<script>hola</script>

The diagrams below explain the difference between a normal operation (like php 5.1, without Filter enabled) and a PHP including Filter support:

 

Prerequises

Filter works out of the box from PHP version from 5.1.0 or ealier. The extension is bundled in PHP 5.2.0 or earlier.

Installation

PHP 5.2.0 or earlier has Filter bundled and enabled by default. There is no need to manually install it.

Unix/Linux:

or

Windows:

Download the filter.dll for your PHP version on http://pecl4win.php.net

 

For all platforms, add extension=filter.so to your php.ini and restart your web server.

For more informations about PECL installation procedures, please read the PHP manual.

General considerations

Filter knows two kinds of filter:

  1. santizing filters
    • Allow or disallow characters in a string
    • Does not cate about the data format
    • It always returns a string
  2. logical filters
    • Strong analysis of the data
    • Knows the formats
    • Returns the expected type on success

Filters are invoked using one of these functions:

  • filer_input, fetches one input variable
  • filter_input_array, fetches many input variables in a single call
  • filter_var, filter one variable
  • filter_var_array, filter many variables in one call

The value is returned filtered and using the right type on success, FALSE if the filter fails (bad chars, out of range, …) and NULL if the variable is not set. Using the optional flag FILTER_NULL_ON_FAILURE, the behavior can be reversed, it will return NULL on failure and FALSE if the variable is not set.

With these three different states, it becomes easy to get rid of the isset or is_numeric mess.

A simple form using logical filter

And the little script to process it:

filter_has_var tests if a given variable is set or not. It does not make any validation but only tells whether a variable is set. It is the equivalent of isset($_POST['submit']). filter_input fetches one single value and returns it filtered. In this example an integer is expected.

If this script displays the last “Tintin” cartoon, the age must be between 7 and 77 :). The numeric filters options accept a minimum and maximum range:

To add another field for example an email, the procedure is identical:

Just like the other field, $email will be FALSE if the email is invalid and NULL if the email field is missing (for example a lazy spam bot did not detect it).

A simple form using a sanitizing filter

The following filter_input call will clean up the “name” variable and returns it useable :

If the “name” contains a value similar to:

Johnny Weißmüller <b>Jr</b>

FILTER_SANITIZE_SPECIAL_CHARS will return:

Hello Johnny Weißmüller <b>Jr</b>.

A nicer filter for this field would be:

and it returns

Hello Johnny Wei&#223;m&#252;ller Jr.

The sanitizing filters accept many options and flags. In this example, we use the “string” filter (also called “stripped”) which accepts low or high values stripping. A complete list of filters and flags is available in the Filter manual.

How to Fetch all values in one call?

and the script to process it:

As the script shows, fetching all values you need is as easy as fetching a single value. The only difference is how options or flags are given. An array must be used as soon as you need options or flags.

It can look overkilled to use such array, but it amazingly increases the readability of the input parsing code in a script. Adding, removing or editing input rules can be done in seconds.

Complex processing using callback

Instead of doing a simple string validation for the “favourites”, a user function will be used. The “options” argument is used to define the callback using the same syntax as the PHP call_user_func.

The function will be called once if the variable is a scalar; if the variable is an array, it will be called once for each element.

Filter does not sanitize or validate the input before or after the callback. But filter_var can be used inside your callback function or method as shown in this example.

Why no OO?

Filter does not provide any object oriented interface. The current API is flexible enough to add any kind of filters, support unicode (more on this topic in another article) or to integrate into your favourite framework or MVC application.

If someone wrote a generic OO wrapper for the filter function, I would be happy to include it in the examples directory, but do not wait an OO interface, it is not going to come anytime soon.

Default filter, PHP 5.1.x and shared host

The default filter is set to unsafe_raw in PHP 5.2.0 or later. It will not be set to any other default filter by default. An attempt has been made to set it to string, but it breaks way too much applications and it makes the migration process a pain. If you plan to use it, be sure to have first fixed all your applications or loudly warned your users.

Filter works out of the box on PHP 5.1.x. I would recommend to install it by default on any PHP 5.1.0 (without default filter). I’m sure all applications will soon rely on it. It does not solve all security issues but it gets rid of all common mistakes or bad usages.

Filter Links

Other resources about PHP and Security