Adding Multi-Language Support to Web Applications with PHP and PEAR

Adding Multi-Language Support to Web Applications with PHP and PEAR

Lost In Translation

If you’re a developer building a Web application for global consumption, presenting it in a single language is usually insufficient. To make an application accessible to a larger audience, it’s necessary to present it in multiple languages, so that users from different areas of the world can make use of it. This is the approach followed by some of today’s most popular applications; look inside Flickr, Blogger, Google or Yahoo!, and chances are you’ll find a version in your local language or dialect.

If you’re using PHP, adding multi-language support to a Web application is quite easy, especially since the PHP Extension and Application Repository (PEAR) includes some ready-made code to help you get started. And that’s where this article comes in. Over the next few pages, I’ll introduce you to PEAR’s Translation2 package, and show you how you can use it to add multi-language support to your application.

Speaking The Language

The PEAR Translation2 package provides a framework for managing strings in different languages, and in retrieving and interpolating those strings into a Web page at run-time. It can read language strings from different data sources, including databases and XML files, and can also be easily extended to support custom functions. It is currently maintained by Lorenzo Alberton, Ian Eure and Michael Wallner, and is freely available under a BSD license

Before we get started, it’s important to state the assumptions this tutorial makes:

1. First, it assumes that you understand HTML, know the basics of PHP programming with objects, and are familiar with using SQL result sets and XML data in PHP.

2. Second, it assumes that you have an Apache/PHP development environment and a MySQL RDBMS already set up.

3. Third, it assumes that you’ve managed to successfully install the Translation2 package. This article uses v2.0.1 of Translation2.

To install this package using the PEAR installer, use the following command:

The Translation2 package also depends on various other packages, which you should install in order to run the examples in this article: MDB2, Cache_Lite and XML_Serializer. You can download these packages using the PEAR installer, as described above.

Saying Hello

Assuming you’re ready to go with all of the above, let’s begin with a simple example that demonstrates how Translation2 works. While the package supports a number of different data source configurations, the simplest is a two-table database: one to store a list of available languages, and the other to store translated strings for each language.

To create these tables, log into your MySQL server and use this SQL file to create the necessary tables. Once done, the tables should look something like this:

As you can see, the 'langs' tables introduces four languages – English, French, German and Hindi – while the 'strings' table contains translations for two simple phrases in each of these languages. Note that languages are identified by their two-character ISO code, while strings have both a unique string identifier (strings.string_id) and a non-unique page identifier (strings.page_id); the latter makes it possible to group a set of related strings together. Further, both tables specify MySQL’s UTF-8 character set, to ensure that non-Latin characters are not corrupted.

Note that all translated strings used in this and subsequent examples were generated using Google Translate.

Once you’ve got the data source set up, the next step is to use the Translation2 package to retrieve these strings. Here’s an example that illustrates it in action:

The first step here is to include the Translation2 class, and initialize an options array that contains information on how the data source is configured. Since the data source in this case is a database, the options array contains the default table and field names for the language and string tables set up earlier. This options array is then passed to the Translation2 constructor, together with the name of the data container and its DSN.

Once an instance of the Translation2 class is initialized, accessing individual strings is very easy: simply call the instance’s get() method with the string identifier, page identifier (optional) and language identifier (also optional). The Translation2 package will use this information to connect to the database and retrieve the translated string in the specified language.

To make things simpler, you can set a default page identifier and language identifier for all get() calls, via the setPage() and setLang() methods. In the example above, the setLang() method is used to set the default language to either the code passed as a GET parameter or, failing that, to English. The setCharset() method sets the default character set to use – in this case, UTF-8.

To see this script in action, access it through your Web browser, and you should see something like this:

Now, append the string ?lang=fr to the URL and refresh the page. You should see the same text in French:

Switch to Hindi by altering the GET variable to ?lang=hi, and you should see the Hindi version:

An alternative to the get() method is the getPage() method, which returns all the strings in a given group (or “page”) as an associative array. This can be more efficient when retrieving a large number of strings from the database. The next example illustrates this method:

Here’s the output:

Going To The Source

In the previous examples, a single table contained the strings for all languages. However, if you prefer, you can store the strings for each language in a separate table. Use this SQL file to create the new set of tables. Once done, the tables should look something like this:

Now, you need to let Translation2 know where it can find strings for each language. This is done by adjusting the 'strings_tables' key of the options array. Here’s the revised code:

If you prefer, you can also use a flat file in XML format as the data source. This XML file contains both language meta-information and translated strings. Here’s an example of what this file should look like:

Create a Unicode-compliant file with the above contents in your development directory (or download this one), and adjust the script above to use an XML container instead of a database container, as below:

Silly Substitutions

An interesting feature of the Translation2 package, is its support for string placeholders. This makes it possible to replace segments of a translated string with different values, by using a placeholder variable within the string. This feature is particularly useful when interpolating user-supplied input into translated strings at run-time.

To illustrate, how this works, consider an example. Suppose you have the following Web form:

Now, further suppose that when the form is submitted, you want to display the response "Did you say [____], silly [boy/girl]?" in the selected language, as below:

With Translation2’s support for placeholders, this isn’t actually very difficult.

1. The first step is to generate a localized version of the string in each language, and assign placeholders to these variables within the string. So, for example, the English version of the string would read "Did you say '&&q&&', silly &&s&&?", while the French version would read "Vous avez dit '&&q&&', stupide &&s&&?". Note that the && delimiters serve as markers for the placeholder variables within the string.

It’s also a good idea to store strings for the words “boy” and “girl” in each language, as separate records in the storage container.

Use this SQL file to create the database. Here’s an example of what it might look like:

2. Once the strings have been set up, the next step is to assign values to the variable placeholders at run-time. This is accomplished via the setParams() method, which accepts an array of variable-value pairs and correctly replaces the placeholder variables with the corresponding values when the string is retrieved with a call to get().

To illustrate, consider that the following lines of code would generate the string "Did you say 'baaa', silly girl?":

Here’s the complete script for you to look at:

In addition to the methods already discussed, this script also makes use of the getLangs() method, which returns a list of all available languages. This is used to populate the language drop-down list in the Web form.

Here are some examples of this script in action:

Add And Subtract

The Translation2 package also provides an administrative API for language and string manipulation. This API makes it possible to add, remove and update new languages and strings in the storage container.

To illustrate, consider the next example, which uses this API to add a new language via the addLang() method:

Here, instead of initializing the standard Translation2 object, the script instead initializes a Translation2_Admin object, and passes it the usual array of options with database and table information. Next, a new array containing information on the language to be added is created, with keys corresponding to the fields of the language table, and the object’s addLang() method is invoked with this array as argument. Behind the scenes, the addLang() method adds a new record to the 'langs' table, and modifies the 'strings' table with an additional column for the new language.

You can check this by examining the database again – you should see entries for the new language:

Just as you can add a language, so too can you add strings in that language. Consider the next example, which shows you how to do this via the object’s add() method:

And here’s what the record looks like:

To update the record for a language, call the updateLang() method with a new array as argument, as below:

To update the record for a particular string, simply call add() again with the new string values; Translation2_Admin will automatically locate and update the existing record for you.

To remove a language, call the removeLang() method with the language code, as below:

Note that removing a language will also remove all strings in the language from the storage container. If, instead, what you really wanted to do was remove a specific string, you should have called the remove() method with the string and page identifiers as argument:

Navigating The Language Minefield

Now that you have a handle on what Translation2 can do, let’s look at it in the context of a real-world application: building a navigation menu for a Web site that must cater to users of different European countries.

Begin by using this SQL file to create the example database, which contains strings for main menu items in four languages: English, French, German and Dutch. Your example database should look like this:

Next, create a Web page that dynamically builds a navigation menu in the selected language, by retrieving values from this database using the Translation2 package. Here’s the code:

Nothing very complicated here: the script checks for the $_GET[‘lang’] variable and uses the selected language (or English, if none is specified) with the getPage() method to retrieve all the navigation-related strings as an associative array. These strings are then used to build the main menu. The user can switch to a different language simply by selecting the corresponding name from the language menu.

Here’s what the output looks like in different languages:

Now, you’ll realize that, since the navigation menu is usually present on each page of the Web site, each request to the Web server will also generate a query to the database server for translated strings. This is not very efficient, especially when you consider that the translated strings are not likely to change frequently. This fact makes these strings an ideal target for caching…and coincidentally, the Translations2 package includes built-in support for the PEAR Cache_Lite package, via something called a “decorator”.

Decorators are add-on modules to the Translations2 package, allowing it to be easily extended with custom functions. A number of built-in decorators ship with the package; the Cache_Lite decorator is one of them. To activate it, simply invoke it via the object’s getDecorator() method, and set caching options (disk location, refresh interval) using the setOptions() method, as in the revision below:

As a result of this change, translated strings will be stored in a disk cache and served from this cache on subsequent requests, thereby reducing the load on the database server.

A number of other decorators also ship with the Translations2 package, including ones for memory-based caching, HTML entity translation, language fallback or error text display for unavailable translations, and encoding conversion. You can read more about them in the package documentation.

As these examples illustrate, the Translations2 package provides a sophisticated and flexible framework for supporting multiple languages in a Web application. As such, it’s a valuable addition to any programmer’s toolkit, and one that you should definitely take some time out to experiment with. Play with it sometime, and see what you think!

Copyright Melonfire, 2009. All rights reserved.