Reading and Writing MP3 and Photo Metadata with PECL

November 10, 2008

Tutorials

Reading and Writing MP3 and Photo Metadata with PECL

Data Overload

Like most other people who have a digital camera and an iPod, my hard drive is a mess of photos, MP3s and videos. And not only does this collection grow very rapidly, by giga-leaps and bounds, but finding a particular track or photo in it can take me hours on end, leading to much frustration, heartburn and wasted Sundays.

It’s not that I’m messy by nature: I absolutely do intend to sit down one day and organize it all so that putting my digital finger on the photos I took, say, in Capri in the summer of 2006 doesn’t take more than a few seconds. Scout’s honour. It’s just that every time I sit down to have a go at it, the sheer volume of data overwhelms me and I take the command decision to deal with something easier instead.

Sounds familiar? If it does, help is at hand, in the form of PHP’s ID3 and EXIF extensions. These extensions can help you organize and catalog your digital media collection so that it’s easier to navigate and search. Keep reading, and I’ll show you how.

Hiding In Plain Sight

First up, the basics. Both digital audio files and photographs come with the capability of storing additional descriptive information (such as the track title and artist, or the photo resolution and exposure) within the file itself. For MP3 audio files, these additional descriptors are called ID3 tags; for digital photographs, the format used is the EXchangeable Image File (EXIF) format.

ID3 support in PHP comes through PECL’s ext/id3 extension, which is maintained by Stephan Schmidt and Carsten Lucke and provides a function-based API for reading and writing ID3 information. This extension, which is available for both Windows and *NIX, can be installed using the pecl command, as below:

shell# pecl install id3

The PECL installer will now download the source code, compile it and install it to the appropriate location on your system.

Alternatively, manually download the source code archive (v0.2 at this time) from http://pecl.php.net/package/id3 and compile it into a loadable PHP module with phpize:

shell# cd id3-0.2
shell# phpize
shell# ./configure
shell# make
shell# make install

This procedure should create a loadable PHP module named id3.so in your PHP extension directory. You should now enable the extension in the php.ini configuration file.

EXIF support in PHP is much easier to enable, as the EXIF module is bundled with the PHP source code. To enable it on *NIX, add the –enable-exif parameter when configuring your PHP build; on Windows, enable both the EXIF module and the multi-byte string (mbstring) module by uncommenting the corresponding lines in the php.ini configuration file. More information on this can be obtained from the PHP manual, at http://www.php.net/exif

Once you’ve got both extensions installed, restart the Web server and check that they’re working with a quick call to phpinfo():

Playing Tag

Let’s take a look at the ID3 extension first. Typically, an MP3 file includes ID3 tags for the track title, artist, album, genre, original playlist sequence number, and a free-form comment. All of this information can be accessed via the id3_get_tag() function, which accepts an MP3 file path as argument and returns an associative array of ID3 tags and values. Here’s an example:

<?php
// get and print tags
$file = 'xyz.mp3';
$tags = id3_get_tag($file);
print_r($tags);
?>

And here’s what the output might look like:

Array
(
    [title] => My All                        
    [artist] => Mariah Carey                  
    [album] =>                               
    [year] => 1997
    [comment] => Encoded by The One
    [genre] => 13
)

Fairly self-explanatory, except the ‘genre’ key. This key holds a number between 1 and 147 that maps to a genre name. To convert that number into a human-readable genre label, try passing it through the id3_get_genre_name() function, as below:

<?php
// get and print tags
$file = 'xyz.mp3';
$tags = id3_get_tag($file);
// convert genre ID to name
if (isset($tags['genre'])) {
  $tags['genre'] = id3_get_genre_name($tags['genre']);
}
print_r($tags);
?>

And here’s the revised output:

Array
(
    [title] => My All                        
    [artist] => Mariah Carey                  
    [album] =>                               
    [year] => 1997
    [comment] => Encoded by The One
    [genre] => Pop
)

It’s worth noting that ID3 tags come in two versions, 1.x and 2.x. In most cases, the ID3 extension will automatically determine which version is in use; however, you can also specify a particular version by passing the corresponding constant to the id3_get_tag() function (see the next code listing for an example).

Wrap a DirectoryIterator around this and you can dynamically generate a listing of all the tracks in a particular directory. Here’s an example:

<html>
  <head></head>
  <body>
    <table border="1">      
      <tr>
        <td>FILENAME</td>
        <td>TITLE</td>
        <td>ARTIST</td>
      </tr>
      <?php
      error_reporting(0);
      $iterator = new DirectoryIterator('.');
      foreach($iterator as $file) {
        if (preg_match('/.mp3$/i', $file->getFilename())) {
          $tags = id3_get_tag($file->getFilename(), ID3_V1_0);
          echo '<tr>';
          echo '<td>' . $file->getFilename() . '</td>';
          echo '<td>' . $tags['title'] . '</td>';
          echo '<td>' . $tags['artist'] . '</td>';
          echo '</tr>';
        }
      }
      ?>
    </table>
  </body>
</html>

And here’s what the output might look like:

Rock and Roll

PHP’s ID3 extension doesn’t just allow you to read track metadata; it also lets you update existing tags or write new ones via its id3_set_tag() function. Here’s an example, which alters the title, artist and genre information for a particular file:

<?php
// set array of tag information
$file = 'xyz.mp3';
$tags = array(
  'artist'  => 'Aerosmith',
  'title'   => 'Cryin\'',
  'year'    => '1998',
  'genre'   => id3_get_genre_id('Rock'),
  'comment' => 'rock on!'  
);
// write tags to file
if (is_writable($file)) {
  if (id3_set_tag($file, $tags) === true) {
    echo 'SUCCESS: Tags set!';  
    print_r(id3_get_tag($file));
  } else {
    echo 'FAIL: Tags could not be set.';      
  }
} else {
  echo 'FAIL: File is not writeable.';        
}
?>

Here’s the likely output:

SUCCESS: Tags set!


Array
(
    [title] => Cryin'
    [artist] => Aerosmith
    [album] =>                               
    [year] => 1998
    [comment] => rock on!
    [genre] => 17
)

And the id3_remove_tag() function makes it easy to remove all the tags from a file. Here’s an example:

<?php
// remove tags from file
$file = 'xyz.mp3';
if (is_writable($file)) {
  if (id3_remove_tag($file) === true) {
    echo 'SUCCESS: Tags removed!';  
  } else {
    echo 'FAIL: Tags could not be removed.';    
  }
} else {
    echo 'FAIL: File is not writeable.';      
}
?>

SQLite My Fire!

Now that you know the basics, let’s look at a couple of applications. First up, how about solving the data organization problem, by automatically cataloging all the MP3 files on the disk in a database, so that it becomes easier to find specific tracks? With PHP’s ID3 functions, this is a snap!

First, create an SQLite database to hold the data, as follows:

sqlite> CREATE TABLE music (
   ...> id INTEGER PRIMARY KEY,
   ...> artist TEXT,
   ...> title TEXT,
   ...> year INTEGER,
   ...> genre TEXT,
   ...> filepath TEXT
   ...> );
sqlite>

Then, write a script to iterate over all the MP3 files in a directory tree, identify and extract track metadata, and save it to this SQLite database:

<?php
// open SQLite database
$sqlite = new SQLiteDatabase('music.db');
if (!$sqlite) {
  die('ERROR: Could not open database file!');
}

// initialize a recursive directory iterator
$dir = new RecursiveDirectoryIterator(".");
foreach(new RecursiveIteratorIterator($dir) as $file) {
  // find .mp3 files
  // for each such file, get tags and file path
  // insert into SQLite database as record
  if (preg_match('/.mp3$/i', $file->getFilename())) {
    $tags = id3_get_tag($file->getPathname(), ID3_V1_0);
    $title = sqlite_escape_string($tags['title']);
    $artist = sqlite_escape_string($tags['artist']);
    $genre = sqlite_escape_string(id3_get_genre_name($tags['genre']));
    $year = sqlite_escape_string($tags['year']);
    $filepath = realpath($file->getPathname());
    $sql = "INSERT INTO music (title, artist, genre, year, filepath) VALUES('$title', '$artist', '$genre', '$year', '$filepath')";
    if ($sqlite->queryExec($sql, $err) === false) {
      die("ERROR: $error. SQL: $sql");
    } 
  }
}

// close SQLite database
unset($sqlite);
echo 'Task complete';
?>

Nothing too complicated here: the script begins by instantiating an SQLiteDatabase object, which can be used to interact with the SQLite database created in the earlier step. Next, it initializes a RecursiveDirectoryIterator to iterate over the directory tree, finding MP3 files and using the id3_get_tag() function to extract track information from each file. This information is then formulated into an SQL query and used with the SQLiteDatabase object’s queryExec() method to save the information to the SQLite database. This process continues until all the directories in the current tree have been processed.

Once the script completes execution, you can look in the SQLite database to see a neatly-organized catalog of MP3 files:

sqlite> SELECT artist, title, genre FROM music;
Mariah Carey   |My All                        |
Cutting Crew   |Died In Your Arms             |Pop
Modern Talking |You're My Heart, You're My Sou|Pop
Mariah Carey   |Emotions                      |Top 40
Mariah Carey   |We Belong Together            |
Unknown        |Usted Me No Conoce            |Salsa

Want to get a little fancier? How about generating an XML listing for your tracks?

<?php
// send XML header
header("Content-Type: text/xml"); 
echo "<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>";
?>
<collection>        
<?php
// initialize a recursive directory iterator
$dir = new RecursiveDirectoryIterator(".");
foreach(new RecursiveIteratorIterator($dir) as $file) {
  // find .mp3 files
  // for each such file, get tags and file path
  // save as <item> element
  if (preg_match('/.mp3$/i', $file->getFilename())) {
    $tags = id3_get_tag($file->getPathname(), ID3_V1_0);
?>    
  <item>
    <title><?php echo $tags['title']; ?></title>
    <artist><?php echo $tags['artist']; ?></artist>
    <genre><?php echo id3_get_genre_name($tags['genre']); ?></genre>
    <year><?php echo $tags['year']; ?></year>
    <filepath><?php echo realpath($file->getPathname()); ?></filepath>
  </item>
<?php     
  }
}
?>
</collection>

This script sets up an XML template, wherein each track is represented as an <item> of a <collection>. Title, artist, genre and file information is then obtained with the id3_get_tag() function and saved within each <item>.

Here’s an example of the output:

<?xml version="1.0" encoding="iso-8859-1"?>
<collection>        
<item>
    <title>Died In Your Arms</title>
    <artist>Cutting Crew</artist>
    <genre>Pop</genre>
    <year>1990</year>
    <filepath>/dev/06.mp3</filepath>
</item>
<item>
    <title>You're My Heart, You're My Sou</title>
    <artist>Modern Talking</artist>
    <genre>Pop</genre>
    <year>1992</year>
    <filepath>/dev/11.mp3</filepath>
</item>
<item>
    <title>Emotions</title>
    <artist>Mariah Carey</artist>
    <genre>Top 40</genre>
    <year></year>
    <filepath>/dev/14.mp3</filepath>
</item>
    <title>Usted Me No Conoce</title>
    <artist>Unknown</artist>
    <genre>Salsa</genre>
    <year>    </year>
    <filepath>/dev/Usted.mp3</filepath>
  </item>
  ...
</collection>

Pictures and Words

Let’s move on to PHP’s EXIF extension, which lets you do similar things as the ID3 extension, except with digital photos instead of audio. EXIF, which is supported by most modern digital cameras, is a specification for adding descriptive information to a photo, including the date and time it was taken, the camera settings used, a copyright notice and a thumbnail image. This information is encoded into EXIF headers and stored within the photo file itself.

PHP’s EXIF extension allows you to read this EXIF information from a digital photo and use it in an application. The function to use is the exif_read_data() function, which accepts a file path as argument and returns a nested associative array of EXIF keys and values. Here’s an example:

<?php
// read and print EXIF headers
$file = 'IMG_0478.JPG';
$exif = exif_read_data($file);
$it = new RecursiveIteratorIterator(new RecursiveArrayIterator($exif));
foreach($it as $key => $value) {
  echo "$key: $value <br/>";  
}
?>

And here’s a snippet of the information found:

As you can see, there’s quite a lot of information there. Most of the time, however, you’re going to be interested in the camera make and model; the photo date and time; camera settings such as the shutter speed, aperture and exposure; and the image thumbnail.

Image thumbnail, you say. Where did that come from? Well, the EXIF format also supports embedded thumbnails, which can be used to provide the user with a preview of the actual photograph. This thumbnail can be extracted with the exif_thumbnail() function, as in the following example:

<?php
// extract EXIF thumbnail
$file = 'IMG_0478.JPG';
$th = exif_thumbnail($file);
if ($th !== false) {
  header('Content-Type: image/jpeg');
  print $th;
}
?>

And here’s what the output might look like:

The exif_thumbnail() function can also return the thumbnail image’s height, width and type; these values are stored by reference in (optional) additional variable arguments that can be passed to the exif_thumbnail() function.

Note that thumbnails might be in either JPEG or TIFF format, so the PHP manual suggests letting PHP figure out the thumbnail image type and send the appropriate header, via the image_type_to_mime_type() function.

Here’s a revised version of the previous listing, that incorporates all of this information:

<?php
// extract EXIF thumbnail
$file = 'IMG_0478.JPG';
$th = exif_thumbnail($file, &$width, &$height, &$type);
if ($th !== false) {
  header('Content-Type: ' . image_type_to_mime_type($type));
  print $th;
  exit;
}
?>

Viewing Gallery

With all this information at hand, it’s quite easy to build some useful applications for photo navigation and organization. One of the easiest to build is a photo gallery, which generates a thumbnail view of all the images in a directory and provides additional descriptive information for each image. Here’s the code:

<html>
  <head></head>
  <body>
<?php
// initialize recursive directory iterator
$dir = new RecursiveDirectoryIterator(".");
foreach(new RecursiveIteratorIterator($dir) as $file) {
  // find photo files
  // extract EXIF information
  if (preg_match('/(.jpg|.jpeg|.tif|.tiff)$/i', $file->getFilename())) {
    $exif = exif_read_data($file->getPathname());
    if (preg_match('/(EXIF|IFD0)/', $exif['SectionsFound'])) {
?>   
    <div style="padding: 5px; border-bottom: 2px solid silver"> 
      <table>
        <tr>
          <td><img src="thumb.php?path=<?php echo urlencode(realpath($file->getPathname())); ?>" /></td>
          <td style="font: 12px Helvetica,sans-serif; text-align: top">
            File path: <?php echo realpath($file->getPathname()); ?> <br/>
            Camera: <?php echo $exif['Model']; ?><br/>
            Date and time: <?php echo date('d-M-Y h:i', $exif['FileDateTime']); ?><br/>
            Exposure: <?php echo $exif['ExposureBiasValue']; ?> |
            Exposure time: <?php echo $exif['ExposureTime']; ?><br/>
            F-number: <?php echo $exif['FNumber']; ?> | 
            Aperture: <?php echo $exif['ApertureValue']; ?><br/>
            Shutter speed: <?php echo $exif['ShutterSpeedValue']; ?> |
            Focal length: <?php echo $exif['FocalLength']; ?><br/>
          </td>
        </tr>
      </table>
    </div>
<?php    
    }
  }
}
?>
  </body>
</html>

This script initializes a RecursiveDirectoryIterator to iterate over all the files in the current directory and its children, processing only those with JPEG or TIFF extensions. The exif_read_data() function is used on each such file, and the information extracted from each image is formatted and displayed wherever possible.

A thumbnail is also generated for each image, via the thumb.php script. This script receives the image file name and path via the GET method, and combines this information with the exif_thumbnail() method to extract and display the corresponding thumbnail image, where available. You’ve seen something similar in action on the previous page; here’s what this version looks like:

<?php
// get and display thumbnail for specified file
if (isset($_GET['path'])) {
  if (file_exists(urldecode($_GET['path']))) {
    $image = exif_thumbnail(urldecode($_GET['path']), &$width, &$height, &$type);
    if ($image !== false) {
      header('Content-type: ' .image_type_to_mime_type($type));
      print $image;
      exit;      
    }
  }
}
?>

And here’s what the output looks like:

Another common application involves extracting all the thumbnail images from the source photos for use in an index or Web page. Here’s an example of how this could be done:

<?php
// recursively process directories
$dir = new RecursiveDirectoryIterator(".");
foreach(new RecursiveIteratorIterator($dir) as $file) {
  // extract thumbnails
  if (preg_match('/(.jpg|.jpeg|.tif|.tiff)$/i', $file->getFilename())) {
    $thumbData = exif_thumbnail($file->getPathname(), &$width, &$height, &$type);
    if ($thumbData !== false) {
      // decide if TIFF or JPEG format
      // save as new file
      $mime = image_type_to_mime_type($type);
      $ext = ($mime == 'image/tiff') ? '.tif' : '.jpg';          
      $outfile = substr(realpath($file->getPathname()), 0, strrpos(realpath($file->getPathname()), '.')) . '_thumb' . $ext; 
      file_put_contents($outfile, $thumbData) or die("Cannot write file: $outfile");
    }
  }
}
?>

Here, the exif_thumbnail() function is used in combination with a RecursiveDirectoryIterator to extract thumbnail images and save them to separate files with file_put_contents().

As these examples illustrate, PHP’s ID3 and EXIF extensions make it fairly easy to extract metadata from audio and image files, and use this data in different applications. Try them out for yourself sometime, and see what you can come up with!

Copyright Melonfire, 2008. All rights reserved.

About Vikram Vaswani

Vikram Vaswani is the founder and CEO of "Melonfire":http://www.melonfire.com/, a consultancy specializing in open-source tools and technologies. He is a passionate proponent of the open-source movement and frequently contributes articles and tutorials on open-source technologies, including Perl, Python, PHP, MySQL, and Linux, to the community at large. He is the author of four books on PHP and MySQL, including "MySQL: The Complete Reference":http://www.mysql-tcr.com/, "How to Do Everything with PHP and MySQL":http://www.everythingphpmysql.com/ and "PHP Programming Solutions":http://www.php-programming-solutions.com/. Vikram has more than eight years of experience working with PHP and MySQL as an application developer. He is the author of Zend Technologies' "PHP 101 series":http://devzone.zend.com/tag/PHP101 for PHP beginners, and has extensive experience deploying PHP in a variety of different environments (including corporate intranets, high-traffic Internet Web sites, and mission-critical thin client applications). A Felix Scholar at the University of Oxford, England, Vikram combines his interest in Web application development with various other activities. When not dreaming up plans for world domination, he amuses himself by reading crime fiction, watching old movies, playing squash, blogging, and keeping an eye out for unfriendly Agents.

View all posts by Vikram Vaswani

2 Responses to “Reading and Writing MP3 and Photo Metadata with PECL”

  1. _____anonymous_____ Says:

    http://pel.sourceforge.net/
    allows you to get and set exif data for images, but can have memory issues if dealing with a large number of files. the api is quite nice though.

  2. _____anonymous_____ Says:

    have you ever used ktaglib? It offers also saving images but in fact from ID3v2. I’m not sure if id3 is able to do so. It also gives you a object oriented interface.