Working with RAR, LZF and BZ2 Compression Formats in PHP

November 3, 2008

Tutorials

The Ultimate Toolbox

When it comes to dealing with different file formats, it’s hard to faze PHP. XML documents, PDF files, JPEG images, MP3 media…you name it and, chances are, there’s a PHP extension to handle it. And so it is with compression formats like RAR, LZF and Bzip2 – although these archive formats are far less common today than the ubiquitous TAR and ZIP formats, they are still actively used by many applications and projects, and continue to be supported in PHP via PECL extensions.

That’s where this article comes in. Over the next few pages, I’ll be introducing you to some PHP extensions that allow you to create, view and manipulate compressed files in these formats. Keep reading, and be prepared to be amazed!

Kicking The Tyres

RAR support in PHP comes through PECL’s ext/rar extension, which is maintained by Antony Dovgal and provides a function-oriented read-only API for listing and extracting RAR archive files. The LZF extension, maintained by Marcin Gibula, allows for LZF compression and decompression, while the Bzip2 extension allows for Bzip2 compression and decompression.

The easiest way to install these packages is with the automated PECL installer, which takes care of downloading and compiling the extension as a loadable PHP module. Here’s an example, using the RAR extension:

shell# pecl install rar
Alternatively, download the source code archive from http://pecl.php.net/package/rar and compile it into a loadable PHP module with phpize:
shell# cd rar-1.0.0
shell# phpize
shell# ./configure 
shell# make
shell# make install

This procedure should create a loadable PHP module named rar.so in your PHP extension directory. You should now enable the extension in the php.ini configuration file, restart your Web server, and check that the extension is enabled with a quick call to phpinfo():

To enable LZF support, follow a similar process: download, compile and install the extension, using either PECL or manually, and you’re good to go. To enable Bzip2 support, though, you will need to recompile PHP with the –with-bz2 configure-time option.

Windows users have a much easier time of it; pre-compiled Windows versions of php_rar.dll, php_lzf.dll and php_bz2.dll can be downloaded from http://pecl4win.php.net/. Once you’ve got the files, place them in your PHP extensions directory, activate them via your php.ini configuration file, and restart your Web server. You should now be able to see the extensions’ active status with phpinfo(), as described above.

Looking Inwards

Assuming you’ve got all the pieces working, let’s get things rolling with a simple example: reading an existing RAR file with PHP, and printing a list of its contents:

<?php
// open archive 
$rar = rar_open(realpath('.') . "/app-0.2.rar") or die('ERROR: Could not open archive!');

// get list of files in archive
$entries = rar_list($rar);
foreach ($entries as $entry) {
  printf("%s (%d bytes)", $entry->getName(), $entry->getUnpackedSize());
  print "\n";      
}

// close archive
rar_close($rar);
?>

The script begins by using the rar_open() function to open the RAR archive file. If successful, this method returns a resource handle representing the RAR archive; this handle serves as the entry point to all the RAR functions. Next, the rar_list() method is used to produce an array containing information about the archive’s contents. Each element of this array represents a single file (or entry) from the archive, which is stored as an object. Object methods, such as getName() and getUnpackedSize(), are used to obtain additional information on the corresponding file and, once the entire array is processed, the resource handle is destroyed with a call to the rar_close() method.

Here’s a snippet of what the output looks like:

TODO.txt (11590 bytes)
im3.txt (38736 bytes)
im5.txt (7902 bytes)
im6.txt (17387 bytes)
im7.txt (7766 bytes)
zend\Picture004.jpg (253525 bytes)
zend\scan0002a.jpg (2062719 bytes)
zend (0 bytes)

Each object comes with various methods and properties, which can be used to obtain detailed information on the file it represents. The previous listing makes use of the getName() and getUnpackedSize() methods, which return the file name and uncompressed file size, respectively.

To retrieve information on a particular file (rather than the whole shebang), use the rar_entry_get() method with the name of the corresponding entry. Here’s an example:

<?php
// open archive 
$rar = rar_open(realpath('.') . "/app-0.2.rar") or die('ERROR: Could not open archive!');

// get specific file from archive
$entry = rar_entry_get($rar, 'zend\Picture004.jpg') or die('ERROR: Could not get entry!');
printf("%s (%d bytes)", $entry->getName(), $entry->getUnpackedSize());

// close archive
rar_close($rar);
?>

Method Madness

There are other methods as well available to you in ext/rar. Here’s an example which demonstrates some of them:

<html>
  <head>
    <style type="text/css">
    td {
      border: solid 1px black;
      padding: 5px;
    }
    .head {
      font-weight: bold;  
    }
    </style>
  </head>
  <body>
    <table>
      <tr class="head">
        <td>File name</td>
        <td>Uncompressed size</td>
        <td>Compressed size</td>
        <td>Last modified</td>
        <td>CRC</td>
        <td>Pack method</td>
      </tr>
      <?php
      // open archive 
      $rar = rar_open(realpath('.') . "/app-0.2.rar") or die("Could not open archive");
      
      // get list of files in archive
      $entries = rar_list($rar);
      
      // for each, print detailed information
      // calculate compression ratio
      foreach ($entries as $entry) {
      ?>
        <tr>
          <td><?php echo $entry->getName(); ?></td>
          <td><?php echo $entry->getUnpackedSize(); ?></td>
          <td><?php echo $entry->getPackedSize(); ?></td>
          <td><?php echo $entry->getFileTime(); ?></td>
          <td><?php echo $entry->getCRC(); ?></td>
          <td><?php echo $entry->getMethod(); ?></td>
        </tr>
      <?php
      }
      
      // close archive
      rar_close($rar);
      ?>
    </table>
  </body>  
</html>

This script introduces the getPackedSize(), getFileTime(), getCRC() and getMethod() methods, which return the compressed file size, last modification time, CRC checksum and compression method used for each entry in the RAR archive. Here’s what the output looks like:

Decompression Chamber

The RAR extension doesn’t let you create new compressed RAR archives, but it does allow you to decompress existing RAR archives. Each entry object returned by rar_list() or rar_entry_get() comes with an extract() method, which accepts a directory path as argument, and handles the task of extracting the corresponding file to the specified directory.

Here’s an example, which extracts all the files from the RAR archive to the temporary directory:

<?php
// open archive 
$rar = rar_open(realpath('.') . "/app-0.2.rar") or die("Could not open archive");

// get list of files in archive
$entries = rar_list($rar);

// iterate over list, extracting each file
foreach ($entries as $entry) {
  if ($entry->extract(realpath('.') . "/out/")) {
    echo 'Extracted ' . $entry->getName() . "\n";
  }
}

// close archive
rar_close($rar);
?>

Note that if the target directory does not exist, the extract() method will attempt to create it for you. However, while extraction works flawlessly on Windows, I did encounter problems using it on Linux.

You can also selectively extract certain files from the source archive, by using an array filter containing a list of the files to be extracted. Here’s an example:

<?php
// list of files to extract
$fileList = array(
  'zend\Picture004.jpg', 
  'im3.txt'
);

// open archive 
$rar = rar_open(realpath('.') . "/app-0.2.rar") or die("Could not open archive");

// get list of files in archive
$entries = rar_list($rar);

// iterate over list, extracting only 
// files listed in array
foreach ($entries as $entry) {
  if (in_array($entry->getName(), $fileList)) {
    if ($entry->extract(realpath('.') . "/out/")) {
      echo 'Extracted ' . $entry->getName() . "\n";
    }
  }
}

// close archive
rar_close($rar);
?>

X-Ray Vision

Now that you have a fair idea about ext/rar’s capabilities, let’s put what we’ve learned to the test, with a simple application. The script below accepts a RAR archive for upload and prints its contents using the rar_list() method:

<html>
  <head>
    <style type="text/css">
    td {
      border: solid 1px black;
      padding: 5px;
    }
    .head {
      font-weight: bold;  
    }
    </style>
  </head>
  <body>

<?php
if (!isset($_POST['submit'])) {
?>
    <form action="<?=htmlentities($_SERVER['PHP_SELF']); ?>" method="POST" enctype="multipart/form-data">
     Select a file:
     <input type="file" name="file">
     <p>
     <input type="Submit" name="submit" value="Send File">
    </form>
<?php
} else {

  // add some more file security checks
  // eg: file size > 0
  
  if (is_uploaded_file($_FILES['file']['tmp_name'])) {        
?>        
    <table>
      <tr>
          <td><b>Filename</b></td>
          <td><b>Uncompressed size</b></td>
          <td><b>Compressed size</b></td>
          <td><b>Pack ratio</b></td>
          <td><b>Last modified</b></td>
      </tr>   
<?php
    $filename = $_FILES['file']['tmp_name'];
    
    // open uploaded file
    $rar = rar_open($filename) or die("Could not open archive");
    $entries = rar_list($rar);
    
    // iterate over file list
    // print details of each file
    foreach ($entries as $entry) {
?>
      <tr>
        <td><?php echo $entry->getName(); ?></td>
        <td><?php echo $up = $entry->getUnpackedSize(); ?></td>
        <td><?php echo $p = $entry->getPackedSize(); ?></td>
        <td><?php echo ($up > 0) ? sprintf('%0.2f', $p/$up) : '-'; ?></td>
        <td><?php echo $entry->getFileTime(); ?></td>
      </tr>
<?php
    }
    
    // close archive
    rar_close($rar);  
?>    
    </table>
<?php
  } else {
    die ('ERROR: Invalid file!');
  }
}
?>
 </body>
</html>

The script above is divided into two main parts, separated from each other by an if() condition:

  1. The first part of the script checks if the form has been submitted and, if not, displays a file selection box which the user can use to select file for upload. Note that since this POST transaction involves a file transfer, the encoding type of the form field must be set to multipart/form-data.
  2. Once a file has been uploaded, the second half of the script examines the $_FILES array and checks that the file was uploaded correctly. Assuming it is, the rar_open() function is used to open the archive and the rar_list() function is used, in combination with a loop and the various get*() methods, to display the contents of the archive in a neatly-formatted HTML table. Notice that the script also calculates a compression ratio for each file, by dividing the compressed file size by the uncompressed file size; the lower the number, the better the compression.

Here’s an example of what the output looks like:

Note that the script above is illustrative only – allowing users to upload files to your Web application is an inherently dangerous process and one which opens up multiple security holes. If you plan to use this example in a live environment, you should beef up the security checks within the code to avoid malicious uploads.

The Simple Life

If what you’re really after is compressing individual files (rather than creating compressed archives containing multiple files), PHP’s LZF and Bzip2 extensions have you covered. Let’s look at the LZF extension first.

PHP’s LZF extension operates primarily through two functions: lzf_compress() and lzf_decompress() – no prizes for guessing what they do! Here’s a simple example of using the lzf_compress() function to compress an Excel spreadsheet:

<?php
// set input and output files
$in = 'info.xls';
$out = 'compressed.lzf';

// compress file using LZF
if (file_exists($in)) {
  $data = file_get_contents($in) or die('ERROR: Cannot read from input file!');
  if (file_put_contents($out, lzf_compress($data))) {
    echo 'Compressed file created.';
  } else {
   die('ERROR: Cannot write to output file!'); 
  }
}
?>
To recover the original file, simply pass the compressed file to lzf_decompress() as a string, and write the result to a new file. Here's the code:
<?php
// set input and output files
$out = 'out-lzf.xls';
$in = 'compressed.lzf';

// decompress file
if (file_exists($in)) {
  $data = file_get_contents($in) or die('ERROR: Cannot read input file!');
  if (file_put_contents($out, lzf_decompress($data))) {
    echo 'Decompression complete.';  
  } else {
    die('ERROR: Cannot write to output file!');  
  }
}
?>

Nothing very complicated here – the functions are intuitive, simple and easy to use, making them ideal for situations where you need to quickly compress and decompress packets of binary data!

Zip, Zap, Zoom

If you’re looking for even better compression ratios (and don’t mind a small hit in performance), ext/bz2 might suit you better than ext/lzf. The compressed data produced by this extension tends to be somewhat smaller than that produced by its competition, but this comes at some performance cost. Here’s an example of how to use it:

<?php
// set input and output files
$in = 'info.xls';
$out = 'compressed.bz2';

// compress file using BZIP2
if (file_exists($in)) {
  $data = file_get_contents($in) or die('ERROR: Cannot read input file');
  $bz = bzopen($out, 'w') or die('ERROR: Cannot open output file!');
  bzwrite($bz, $data) or die('ERROR: Cannot write to output file!');
  bzclose($bz);
  echo 'Compressed file created.';
}
?>

The bzopen(), bzwrite() and bzclose() functions operate in a similar manner to PHP’s fopen(), fwrite() and fclose() functions, except that they’re designed for use with Bzip2-compressed files. The previous script demonstrates them in action: it first opens a handle to the output file (in write mode) and then uses the bzwrite() function to compress and write the input data to this file handle. Once all the input data has been compressed, the handle is closed and the compressed data saved to disk with the bzclose() function.

Reversing the process is equally simple. All that’s needed is to open a handle to the compressed file, and read and uncompress its contents using the bzread() function. This uncompressed data can then be saved back to a separate file, for later viewing or manipulation. Here’s an example:

<?php
// set input and output files
$out = 'out-bz2.xls';
$in = 'compressed.bz2';

// decompress file using BZIP2
if (file_exists($in)) {
  $data = '';
  $bz = bzopen($in, 'r') or die('ERROR: Cannot open input file!');
  while (!feof($bz)) {
    $data .= bzread($bz, 4096) or die('ERROR: Cannot read from input file');;
  }
  bzclose($bz);  
  file_put_contents($out, $data) or die('ERROR: Cannot write to output file!');
  echo 'Decompression complete.';  
}
?>

And, if you compare the compressed output generated by the LZF and Bzip2 extensions, you’ll see that the Bzip2 version is typically smaller than the LZF version!

And that’s about it for this tutorial. Over the last few pages, I took you on a whirlwind tour of three PECL extensions designed specifically for working with compressed file formats: the RAR, LZF and Bzip2 extensions. As the examples in this tutorial demonstrates, these three extensions can aid you significantly in your daily development, either by allowing you to programmatically view and extract file archives, or by allowing you to reduce your application’s disk footprint through dynamic compression/decompression of binary data.

In case you’d like to read more about the topics discussed in this article, you should consider bookmarking the following pages:

Until next time…happy archiving!

The article copyright Melonfire, 2008. All rights reserved.

About Vikram Vaswani

Vikram Vaswani is the founder and CEO of "Melonfire":http://www.melonfire.com/, a consultancy specializing in open-source tools and technologies. He is a passionate proponent of the open-source movement and frequently contributes articles and tutorials on open-source technologies, including Perl, Python, PHP, MySQL, and Linux, to the community at large. He is the author of four books on PHP and MySQL, including "MySQL: The Complete Reference":http://www.mysql-tcr.com/, "How to Do Everything with PHP and MySQL":http://www.everythingphpmysql.com/ and "PHP Programming Solutions":http://www.php-programming-solutions.com/. Vikram has more than eight years of experience working with PHP and MySQL as an application developer. He is the author of Zend Technologies' "PHP 101 series":http://devzone.zend.com/tag/PHP101 for PHP beginners, and has extensive experience deploying PHP in a variety of different environments (including corporate intranets, high-traffic Internet Web sites, and mission-critical thin client applications). A Felix Scholar at the University of Oxford, England, Vikram combines his interest in Web application development with various other activities. When not dreaming up plans for world domination, he amuses himself by reading crime fiction, watching old movies, playing squash, blogging, and keeping an eye out for unfriendly Agents.

View all posts by Vikram Vaswani

One Response to “Working with RAR, LZF and BZ2 Compression Formats in PHP”

  1. edaquino Says:

    I run the code but it showed me the output like this

    Fatal error: Call to undefined function rar_open() in C:\xampp\htdocs\rar\rar.php on line 44
    Filename Uncompressed size Compressed size Pack ratio Last modified

    does any can help me what’s my problem, just message me @ edaquino09@gmail.com, thanks i can wait for your reply soon as possible.