SQL Query Caching

January 22, 2003

Uncategorized

Intended Audience
Overview
Prerequisites
Caching SQL Query Results
Why cache query results?
Creating an SQL Query
Deciding when to update the cache
Caching the results
Using the Cache
Putting it all together
Possible Additions
The Script
About the Author


Intended Audience


This tutorial is intended for the PHP programmer interested in
caching SQL queries to reduce the overhead of a database connection and query,
and to improve script performance.


Overview


Many sites use a database backend as a data store for the
site. Whether the database contains product information, category structure,
articles or a guest book, some of the data is likely to be quite static and will
greatly benefit from a caching system.

Such a system would cache the results of an SQL query into a
file stored on the system and hence improve the response time by avoiding the
need to make a database connection, forming the query, executing it and
retrieving the results.

On systems where the database does not reside on the same
machine as the web server and requires a remote connection (TCP or similar), or
where large amounts of data are retrieved from the database, you stand to gain
even more in terms of response times and resources used.


Prerequisites


This tutorial will use MySQL as the database. You will need
MySQL installed (available from

www.mysql.com) and PHP MySQL extension
enabled (it is enabled by default).

You will need to know the basics of the SQL (Structured Query Language) in order
to query the database.


Caching SQL Query Results



Why cache query results?


Caching query results can dramatically improve script
execution time and resource requirements.

Caching SQL results also allows you to carry out post
processing on the data. This may not be possible if you use file caching to
cache the outputs of the entire script (HTML output caching).

When you execute an SQL query, the typical process undertaken is:

  • Connect to the
    database
  • Prepare SQL
    query
  • Send query to
    database
  • Retrieve the
    results
  • Close the database
    connection
The above is quite resource intensive and can
adversely affect the script performance. This can be further compounded by
factors such as amount of data retrieved and location of database
server.

Although persistent connections may improve the overhead of
connecting to the database, they are more memory intensive and the overall time
saved will be very little if a large amount of data is retrieved.


Creating an SQL Query

SQL (Structured Query Language) queries are used as an
interface to manipulate a database and its contents. SQL can be used to define
and edit the table structure, insert data into the tables, update and delete
information from the tables.

SQL is the language used to communicate with the database and
in most PHP database extensions (MySQL, ODBC, Oracle etc), the extension manages
the process of passing the SQL query to the database.

In this tutorial, only the select statement is used to
retrieve data from the database. This data is cached and later used as the data
source.


Deciding when to update the cache


Caching can take a few forms according to the program’s needs. The 3 most common
approaches are:

  • Time triggered caching (expiry
    timestamp).
  • Information change triggered caching
    (sensing data has changed and updating the cache
    accordingly).
  • Manual triggered caching (manually
    letting the system know information is outdated and forcing a new cache
    generation).
Your caching requirements may be one or a
combination of the mechanisms above. In this tutorial, the time-triggered
approach is discussed. However, a combination of all 3 approaches can be used as
an overall caching policy.


Caching the results


The basics to caching is using the
serialize() and
unserialize() PHP functions.

The serialize()
function can be used to store PHP values without losing their types and
structure. In fact, the PHP session extension uses the serialized representation
of the variables in a file to store the contents of the session variable
($_SESSION).

The unserialize() function reverses the operation and turns
the serialized string back into its original structure and data
contents.

In this example, an e-commerce store is used. The store has 2
basic tables, categories and products. While product information may change
daily, categories remain fairly static.

For product display, you can use an output caching script to store the resultant
HTML output in a file to be called up. However, categories may need some post
processing. For example, all categories are displayed and according to the category_id
variable that is passed to the script ($_REQUEST['category_id']) you may wish
to highlight the current category selected.

The categories table has the following format:

+———————-+——————+—–+—————-+
| Field | Type | Key | Extra |
+———————-+——————+—–+—————-+
| category_id | int(10) unsigned | PRI | auto_increment |
| category_name | varchar(255) | | |
| category_description | text | | |
+———————-+——————+—–+—————-+
In
this example, the time limited caching technique is used where the cached SQL
output is considered outdated after a set amount of time. In this particular
example, 24 hours are used.

Serialize example:

  • Connect to
    database
  • Execute
    query
  • Get all results into an array so you can
    access them later.
  • Serialize
    array
  • Save serialized array to file

$file 'sql_cache.txt';

$link mysql_connect('localhost','username','password')

    or die (
mysql_error());

mysql_select_db('shop')

    or die (
mysql_error());

/* form SQL query */

$query "SELECT * FROM categories";

$result mysql_query($query)

    or die (mysql_error());

while (
$record mysql_fetch_array($result) ) {

    
$records[] = $record;

}

$OUTPUT serialize($records);

$fp fopen($file,"w"); // open file with Write permission

fputs($fp$OUTPUT);

fclose($fp);

Looking at the sql_cache.txt file, it may
look something like this:

a:1:{i:0;a:6:{i:0;s:1:"1";s:11:"category_id";s:1:"1";i:1;s:9:"Computers";s:13:"category_name";s:9:
"Computers" ;i:2;s:25:"Description for computers";s:20:"category_description"
;s:25:"Description for computers";}}

This output is the internal representation of the variables
and their types. In this case you are using
mysql_fetch_array() that returns both
numeric indexed array and an associative array (which is why the data seems to
occur twice – once with the numeric index and once with the string
index).


Using the Cache


In order to use the cache, you will need to
unserialize() the information back into
the original format.

You can read the contents of the sql_cache.txt file into a
variable using the file_get_contents()
function.

Please note: This function is available in PHP version 4.3.0
and above only. If you are using an older version of PHP, a simple workaround is
using the file() function (reads an
entire file into an array, each new line becomes an array entry). The
implode() function is used to join the
array elements into one string to
unserialize().


// file_get_contents() workaround for PHP < 4.3.0

$file 'sql_cache.txt';

$records unserialize(implode('',file($file)));



You are
now able to go through the $records
array and get the data from the original query:


foreach ($records as $id=>$row) {

    print $row['category_name']."<br>";

}



Note that the
$records array is an array of arrays (a
numeric indexed array containing the query results – each row being a
numeric and string indexed array… what a mouthful).


Putting it all together


The decision whether to cache is time based in this instance. If the file modification
timestamp is greater than the current time less the expiration time set, the cache
is used, else the cache is updated.

  • Check file exists AND timestamp is less than
    expiry time set.
  • Get the records stored in the
    cache file or update the cache file.


$file 'sql_cache.txt';

$expire 86400// 24 hours (in seconds)

if (file_exists($file) && 

    
filemtime($file) > (time() - $expire)) {

    // Get the records stored in cache

    
$records unserialize(file_get_contents($file));

} else {

    
// Create the cache using serialize() function

}




Possible Additions

  • Storing cache results in shared memory for
    faster retrieval.
  • Adding a function that runs the
    SQL query randomly and checks if output is the same as cached output, if not,
    the cache is updated (this function can be given the probability of running once
    in every 100 script executions). Using a hashing algorithm (such as
    MD5()) can assist in determining if the
    contents of a string or file have changed.
  • Adding
    an administrative function that manually deletes the file, hence forcing the
    cache to update (as the file_exists()
    function above would return false). You can use the

    unlink() function to delete the
    file.


The Script


$file 'sql_cache.txt';

$expire 86400// 24 hours

if (file_exists($file) &&

    
filemtime($file) > (time() - $expire)) {

    $records unserialize(file_get_contents($file));

} else {

    
$link mysql_connect('localhost','username','password')

        or die (mysql_error());

    
mysql_select_db('shop')

        or die (
mysql_error());

    
/* form SQL query */

    
$query "SELECT * FROM categories";

    $result mysql_query($query)

        or die (
mysql_error());

    while (
$record mysql_fetch_array($result) ) {

        $records[] = $record;

    }

    
$OUTPUT serialize($records);

    
$fp fopen($file,"w");

    fputs($fp$OUTPUT);

    
fclose($fp);

// end else

// Query results are in $records array

foreach ($records as $id=>$row) {

    if ($row['category_id'] == $_REQUEST['category_id']) {

        
// category selected - print bold

        
print '<B>'.$row['category_name'].'</B><br>';

    } else {

        
// other category - print regular

        
print $row['category_name'].'<br>';

    }

// end foreach



About the Author


Ori Staub is a senior systems analyst, developer and
consultant specializing in web-based solutions. He can be contacted at
os@zucker-staub.com

6 Responses to “SQL Query Caching”

  1. akalend Says:

    the dbcache module is accessible at http://labyrinter.ru/dbcache/
    download at http://labyrinter.ru/dbcache/dbcache.tar.gz

    The dbcache module dependence from mysqli module, it module must be installing.

    install:
    1. phpize
    2. ./configure
    3. make && make install
    4. create cache directory and make permission

    the example:
    $mysqli = new mysqli("localhost", "akalend", "", "test") or die(‘Connect failed’);

    $cacheDir = ‘/home/akalend/cache’; // full cache path
    $dc = new dbCache( $cacheDir );
    $dc->bind( $mysqli ); // bind to mysqli object

    $query="select * from test_data limit 50";
    $livetime = 500; // the cache lifetime of seconds
    $dc->query($query, $livetime); // execute query
    while( $res = $dc->fetch() ){ // fetching data,
    print_r($res);
    }
    print_r( $dc->getInfo() ); // debug info

    You can to get part of recordset:

    $dc->query($query, $livetime); // execute query
    $res = $dc->fetchFrame( 120,10); // get frame from 120 record next 10 records
    print_r($res);
    it tools very usefull for view query by paging.

  2. akalend Says:

    [quote]is this the best way to cache database results? is there a way to cache the recordset which is returned by the query, as opposed to the html? [/quote]
    Yes, the HTML caching have more size. If You have cache less 1 Gb, You d’t have problem ;). I have cache 4 Gb by smarty and 750 Mb by SQLCaching.

    my email in pred post have error, the true email is akalend – mail[pt]ru

  3. akalend Says:

    the temporally link http://www.uploading.com/files/RYXBWP64/dbcache.zip.html
    30 day only…
    next time i open project on the sf.net

    the main idea:
    - binding to mysqli extention
    - filename is md5 hash of query
    - check caching by last update time + livetime parameter
    - using csv format

    code example:
    $mysqli = new mysqli("localhost", "akalend", "", "test") or die(‘Connect failed’);

    $cacheDir = ‘/home/akalend/cache’; // full path to cache dir.
    $dc = new dbCache( $cacheDir );
    $dc->bind( $mysqli ); // bind to mysqli object

    $query="select * from test_data limit 50";
    $livetime = 500; // time of live cache at sec
    $dc->query($query, $livetime); // execute query
    while( $res = $dc->fetch() ){ // fetching data, return ass array or false if finish
    print_r($res);
    }
    print_r( $dc->getInfo() ); // debugging info

    more info you can to find in the distibutive.

    Why binding to mysqli object?
    The dbcache module use open connection from mysqli ext. The PHP project can to use query whithout recordset or whithout ceching and You can to use the mysqli extention. For caching data You can to use the dbcache extention with already one open connection.

    I will happynes answer to any question by mail alakend – mail-ru.

  4. _____anonymous_____ Says:

    is this the best way to cache database results? is there a way to cache the recordset which is returned by the query, as opposed to the html?

  5. akalend Says:

    The idea of caching SQL data is not new.
    The code has one lack, it is productivity.
    The function serialize()/unserialize() is very slow
    and using for Highly loaded system is not good.
    As I understand, the cacheing is use for high-load system.

    I se solve the problem in using csv format for caching.
    So, to increase speed processing is possible by move part logics in the extention.
    Now I develope the sql-caching extention.

    PS. sorry for my english. I will very glad to moderator for correction my post.

  6. gabrielcp100 Says:

    This query caching safe my life/site…
    thanks

    <a href="http://muuzik.net/">Muuzik</a&gt;