PDF Generation Using Only PHP – Part 1

December 23, 2003

Uncategorized

Intended Audience
Overview
Learning Objectives
Prerequisites
How It Works
Setting up Class Variables
The Factory Method

Writing Content
Starting the Document
Adding a Page
Output of Simple Text
Closing the Document
Document output
The Script
The complete class
Example Use

About the Author

Intended Audience


This tutorial is intended for the PHP programmer who needs to
incorporate PDF generation in a script without using external libraries such as
PDFlib (often unavailable due to licensing restrictions or lack of
funds).

This tutorial will cover only the basics, which hopefully will
give you a good start. PDF has a vast set of features and possibilities which
can not be covered in a short tutorial. If you need more than what is covered
here, you might want to look at some similar yet more complete solutions
available, such as the excellent work done by Olivier Plathey on the FPDF class
(http://fpdf.org), on which this tutorial is
based.

Of course, you may wish to take your own route and for that
there is also the
PDF reference
(be warned: it’s 1,172 pages!)

Basic familiarity with using PHP classes is assumed. Knowledge of PDF file structure is not required, as all references are explained.

Overview


PDF files are, after all, just plain text files with specific
markup syntax that describes what should happen to objects within the document,
such as text and images. It follows that, armed with some PDF logic, anyone can
create a PDF file. In this tutorial you will be shown the basic features of the
PDF language, to enable you to put together your own PDF document.

Learning Objectives


At the end of this first part of the tutorial you should be
able to put together a simple PDF class that can:

  • Create and output a PDF document;
  • Set up page size and orientation;
  • Insert simple text into the page;
  • Handle simple font attributes;
  • Activate compression.


Prerequisites


You need to have a fully functional PHP install (either PHP 4
or PHP 5 will work here) and a running web server to output the PDF file from your script.

Acrobat Reader, XPDF, or an equivalent is required to see the results of your work.

You do not need any external library, either separate or compiled into PHP, to generate your PDF files.

How It Works


The best approach is to set the code up as a class. This
allows for greater flexibility later.

The primary (public) methods deal with the main operations on
a PDF document: setting it up, adding pages, setting font, adding text,
activating compression, and output of the document.

We shall review the various methods and features of the PDF
language, and then eventually put it all together as one class.

Setting up Class Variables


We will need a few class variables to keep track of output, pages, objects, settings, etc.

The following is a list of the essential variables, with brief
comments. You will later see each one of these variables in its context, which
will give you a better idea of how they are used. For now just briefly get
yourself acquainted with them.



var $_buffer = '';          // Buffer holding in-memory PDF.

var $_state = 0;            // Current document state.

var $_page = 0;             // Current page number.

var $_n = 2;                // Current object number.

var $_offsets = array();    // Array of object offsets.

var $_pages = array();      // Array containing the pages.

var $_w;                    // Page width in points.

var $_h;                    // Page height in points

var $_fonts = array();      // An array of used fonts.

var $_font_family = '';     // Current font family.

var $_font_style = '';      // Current font style.

var $_current_font;         // Array with current font info.

var $_font_size = 12;       // Current font size in points.

var $_compress;             // Flag to compress or not.

var $_core_fonts = array('courier'      => 'Courier',

                         
'courierB'     => 'Courier-Bold',

                         
'courierI'     => 'Courier-Oblique',

                         'courierBI'    => 'Courier-BoldOblique',

                         
'helvetica'    => 'Helvetica',

                         
'helveticaB'   => 'Helvetica-Bold',

                         
'helveticaI'   => 'Helvetica-Oblique',

                         'helveticaBI'  => 'Helvetica-BoldOblique',

                         
'times'        => 'Times-Roman',

                         
'timesB'       => 'Times-Bold',

                         
'timesI'       => 'Times-Italic',

                         'timesBI'      => 'Times-BoldItalic',

                         
'symbol'       => 'Symbol',

                         
'zapfdingbats' => 'ZapfDingbats');


The Factory Method


This method will give us the PDF object with which we can
build our document. It sets the initial values for the document, such as page
orientation and size, and returns the object.


function &factory($orientation = 'P', $format = 'A4')

{

    
/* Create the PDF object. */

    
$pdf = &new PDF();

    /* Page format. */

    
$format = strtolower($format);

    if (
$format == 'a3') {           // A3 page size.

        $format = array(841.89, 1190.55);

    } elseif (
$format == 'a4') {     // A4 page size.

        
$format = array(595.28, 841.89);

    } elseif ($format == 'a5') {     // A5 page size.

        
$format = array(420.94, 595.28);

    } elseif (
$format == 'letter') { // Letter page size.

        $format = array(612, 792);

    } elseif (
$format == 'legal') {  // Legal page size.

        
$format = array(612, 1008);

    } else {

        die(
sprintf('Unknown page format: %s', $format));

    }   

    
$pdf->_w = $format[0];

    $pdf->_h = $format[1];

    /* Page orientation. */

    
$orientation = strtolower($orientation);

    if (
$orientation == 'l' || $orientation == 'landscape') {

        $w = $pdf->_w;

        
$pdf->_w = $pdf->_h;

        
$pdf->_h = $w;

    } elseif ($orientation != 'p' && $orientation != 'portrait') {

        die(
sprintf('Incorrect orientation: %s', $orientation));

    }

    /* Turn on compression by default. */

    
$pdf->setCompression(true);

    return $pdf;

}


Also in this method we turn on compression by default.
This makes the output PDF files a lot smaller.
The actual
setCompression() method is as follows:


function setCompression($compress)

{   

    
/* If no gzcompress function is available then default to

     * false. */

    $this->_compress = (function_exists('gzcompress') ? $compress : false);

}


However, whilst learning you may wish to explicitly
turn off compression, so that you can open your created PDF document with
a text editor and see easily what is happening.


Writing Content


We will not be writing directly to the PDF file, the content
is going to be buffered as it is created. Only after the PDF document is closed,
and after some rearranging, will it be sent as a PDF file to the browser for
download. So, as a first step, we will need to create a function to buffer the
output. As it will be used internally within the PDF class, let’s make it a
private function.


function _out($s)

{

    if (
$this->_state == 2) {

        
$this->_pages[$this->_page] .= $s . "\n";

    } else {

        
$this->_buffer .= $s . "\n";

    }

}


Here you can see straight away a number of class
variables being used. Let’s take a moment to work through them. The
$_state variable keeps track of four
different states that the PDF document can be in:

0 = initialised

1 = opened but no page opened

2 = page opened

3 = document closed

The state is important in this method for determining how to
buffer the output. If there is an open page, output is sent to the
$_pages array. For any other state it
is sent to the main buffer held in
$_buffer variable.

This distinction is necessary because page content is handled
as a separate object within PDF and hence will need extra work on it when it is
finally written to the main buffer.

As you will later see, the
$_state variable is used elsewhere to
similarly add logic according to the document state.

It is recommended to use the newline (“\n”)
following each output, as it is required in some cases (for example certain PDF
instructions have to begin on a new line). Also, remember that PDF is case
sensitive, so always follow the exact spelling of PDF syntax.


Starting the Document


The following two lines of code which are required for
initializing the document. These two lines must be called before any
output:


function open()

{   

    
$this->_state = 1;          // Set state to initialised.

    
$this->_out('%PDF-1.3');    // Output the PDF header.

}

The second line writes the initial header that is
required to identify the file and the PDF version being followed. The version
number helps PDF readers handle the file properly.
This tutorial will not be covering anything exotic, so you
might as well stick with version 1.3. If you do start incorporating the more
advanced PDF features found in 1.5 you will need to change the version
number.


Adding a Page


We can now add a page to our document. The following code is
quite straightforward.

One point worth noting is the $_font_family check. For any text to be
written to a page we need to set the font. However, we have to take into account
the possibility that the font was set before any page was added, or that the
font was set for a previous page in the current document. Either way we need to
check the font class variable, and output the font information to the page. The
function setFont() is used for this,
which we shall cover later.


function addPage()

{   

    
$this->_page++;                   // Increment page count.

    
$this->_pages[$this->_page] = ''; // Start the page buffer.

    $this->_state = 2;                // Set state to page

                                      // opened.

    /* Check if font has been set before this page. */

    
if ($this->_font_family) {

        
$this->setFont($this->_font_family, $this->_font_style, $this->_font_size);

    }

}



Output of Simple Text


As mentioned earlier, before any text can be output, font
information must be supplied. We therefore need a function to define which font
will be used. PDF specifications offer a core set of fonts which can be used
with no extra information supplied to the PDF reader. You can also embed your
own custom fonts into a PDF file, but for this you need to create font
definitions, which are beyond the scope of this tutorial.

For now, limit your output to the following fonts:

Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique;
Helvetica,
Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique;
Times-Roman,
Times-Bold, Times-Italic, Times-BoldItalic;
Symbol;
ZapfDingbats.

The following method sets the font family name, and also
(optionally) a style such as bold, italic or both, and a font size.


function setFont($family, $style = '', $size = null)

{

    
$family = strtolower($family);

    if (
$family == 'arial') {               // Use helvetica.

        
$family = 'helvetica';

    } elseif ($family == 'symbol' ||        // No styles for

              
$family == 'zapfdingbats') {  // these two fonts.

        
$style = '';

    }

    $style = strtoupper($style);

    if (
$style == 'IB') {                   // Accept any order

        
$style = 'BI';                      // of B and I.

    }

    if (is_null($size)) {                   // No size specified,

        
$size = $this->_font_size;          // use current size.

    
}

    if ($this->_font_family == $family &&   // If font is already

        
$this->_font_style == $style &&     // current font

        
$this->_font_size == $size) {       // simply return.

        return;

    }

    /* Set the font key. */

    
$fontkey = $family . $style;

    if (!isset($this->_fonts[$fontkey])) {  // Test if cached.

        $i = count($this->_fonts) + 1;      // Increment font

        
$this->_fonts[$fontkey] = array(    // object count and

            'i'    => $i,                   // store cache.

            
'name' => $this->_core_fonts[$fontkey]);

    }

    /* Store current font information. */

    
$this->_font_family  = $family;

    
$this->_font_style   = $style;

    
$this->_font_size    = $size;

    $this->_current_font = $this->_fonts[$fontkey];

    /* Output font information if at least one page has been

     * defined. */

    
if ($this->_page > 0) {

        $this->_out(sprintf('BT /F%d %.2f Tf ET', $this->_current_font['i'], $this->_font_size));

    }

}


The following method enables easier changing between font
sizes, without having to go through the whole
setFont() function.


function setFontSize($size)

{

    if (
$this->_font_size == $size) {   // If already current

        
return;                         // size simply return.

    
}

    $this->_font_size = $size;          // Set the font.

    /* Output font information if at least one page has been

     * defined. */

    
if ($this->_page > 0) {

        
$this->_out(sprintf('BT /F%d %.2f Tf ET',

                            $this->_current_font['i'],

                            
$this->_font_size));

    }

}


And now to actually output the text.

You will need to pass to this method the x/y position of your
text, as well as the actual text.


function text($x, $y, $text)

{

    
$text = $this->_escape($text);    // Escape any harmful

                                      // characters.

    $out = sprintf('BT %.2f %.2f Td (%s) Tj ET',

                   
$x, $this->_h - $y, $text);

    $this->_out($out);

}




Note how for simplicity we allow the user to specify
the y position measured from the top edge of the paper (whereas in fact
PDF measures from the bottom). To achieve this we subtract the y value from the
page height in the actual code ($this->_h – $y).

Also note how actual text needs to be escaped to ensure that
it is safely inserted into the file. Since text in the PDF file is denoted using
parentheses around it, any parentheses in the text itself should be escaped.

The best solution is to create a separate function to handle
any cases when text needs to be inserted safely. This will be used a couple of
times in this tutorial, but it will also be useful if you add more functionality
to this class later.


function _escape($s)

{   

    
$s = str_replace('\\', '\\\\', $s);   // Escape any '\\'

    $s = str_replace('(', '\\(', $s);     // Escape any '('

    
return str_replace(')', '\\)', $s);   // Escape any ')'

}


Closing the Document


The closing function is a bit more involved: we need to clean
up a bit, set some PDF tags, and create a few references. This is the code that
does most of the work in setting up the buffered content to finally look like a
PDF file.

Begin by checking that there is at least one page, and setting the state to “page closed”.


function close()

{

    if (
$this->_page == 0) {    // If not yet initialised, add

        
$this->addPage();       // one page to make this a valid

    
}                           // PDF.

    $this->_state = 1;          // Set the state page closed.

Now output the couple of objects that we have been buffering separately: pages and
other resources. We shall go through each method later.


    /* Pages and resources. */

    
$this->_putPages();

    $this->_putResources();

Include some document information. PDF treats this information
as a separate object, and here we introduce the _newobj() function. You could add other
information to this section, such as author, subject, title, keywords, etc. For now we’ll just put in the producer.


    /* Print some document info. */

    
$this->_newobj();

    $this->_out('<<');

    
$this->_out('/Producer (My First PDF Class)');

    
$this->_out(sprintf('/CreationDate (D:%s)',

                        date('YmdHis')));

    
$this->_out('>>');

    
$this->_out('endobj');


The next section is the PDF catalog, which defines how the document will initially look in the reader. You
can take this as it is for now. There’s nothing exciting going on here, but it is needed.


    /* Print catalog. */

    
$this->_newobj();

    
$this->_out('<<');

    $this->_out('/Type /Catalog');

    
$this->_out('/Pages 1 0 R');

    
$this->_out('/OpenAction [3 0 R /FitH null]');

    $this->_out('/PageLayout /OneColumn');

    
$this->_out('>>');

    
$this->_out('endobj');


The cross reference section
is very important. It brings into use the
$_offset array that has appeared
before. PDF stores a byte offset reference to all objects in the document. This
allows the PDF reader to read objects in a random access way, without having to
load the entire document.


    /* Print cross reference. */

    
$start_xref = strlen($this->_buffer); // Get the xref offset.

    $this->_out('xref');                  // Announce the xref.

    
$this->_out('0 ' . ($this->_n + 1));  // Number of objects.

    $this->_out('0000000000 65535 f ');

    
/* Loop through all objects and output their offset. */

    
for ($i = 1; $i <= $this->_n; $i++) {

        $this->_out(sprintf('%010d 00000 n ', $this->_offsets[$i]));

    }

Each object is printed on a separate line. The
offset for each object is printed as a 10 digit integer, followed by a
generation number, and an in-use/free indicator. You need not worry about either
the generation number or the in-use/free indicator, since those will only be
used if editing PDF files and deleting objects. Since we shall be generating
from scratch, the generation number will always be 00000 and the in use
indicator will be set to ‘n’.

The final lines to be printed are the PDF trailer.


    /* Print trailer. */

    
$this->_out('trailer');

    
$this->_out('<<');

    /* The total number of objects. */

    
$this->_out('/Size ' . ($this->_n + 1));

    
/* The root object. */

    
$this->_out('/Root ' . $this->_n . ' 0 R');

    /* The document information object. */

    
$this->_out('/Info ' . ($this->_n - 1) . ' 0 R');

    
$this->_out('>>');

    $this->_out('startxref');

    
$this->_out($start_xref);  // Where to find the xref.

    
$this->_out('%%EOF');

    $this->_state = 3;         // Set the document state to

                               // closed.

}

Now let’s look at the new functions we’ve met in this
document closing method. The _newobj()

function above is used simply to keep track of objects added to the
document.


function _newobj()

{

    
/* Increment the object count. */

    
$this->_n++;

    
/* Save the byte offset of this object. */

    
$this->_offsets[$this->_n] = strlen($this->_buffer);

    /* Output to buffer. */

    
$this->_out($this->_n . ' 0 obj');

}


The _putPages()

function handles the output of the page content. Here we go through the
$_pages array that has been buffering
the page content separately, and output it to the main buffer.

If compression is required page content will be passed through
the gzcompress() function before being
written to output. Here you also can see why the
$_n object counter starts from 2. We
set the root pages parent as object number 1, and later you will see that we set
resources as object number 2. This is just so that it is easier for us to
reference these when required, for example in each page object.


function _putPages()

{

    
/* If compression is required set the compression tag. */

    
$filter = ($this->_compress) ? '/Filter /FlateDecode ' : '';

    
/* Print out pages, loop through each. */

    
for ($n = 1; $n <= $this->_page; $n++) {

        $this->_newobj();                 // Start a new object.

        
$this->_out('<</Type /Page');     // Object type.

        
$this->_out('/Parent 1 0 R');

        $this->_out('/Resources 2 0 R');

        
$this->_out('/Contents ' . ($this->_n + 1) . ' 0 R>>');

        $this->_out('endobj');

        /* If compression required gzcompress() the page content. */

        
$p = ($this->_compress) ? gzcompress($this->_pages[$n]) : $this->_pages[$n];

        /* Output the page content. */

        
$this->_newobj();                 // Start a new object.

        
$this->_out('<<' . $filter . '/Length ' . strlen($p) . '>>');

        $this->_putStream($p);            // Output the page.

        
$this->_out('endobj');

    }

    /* Set the offset of the first object. */

    
$this->_offsets[1] = strlen($this->_buffer);

    $this->_out('1 0 obj');

    
$this->_out('<</Type /Pages');

    
$kids = '/Kids [';

    for ($i = 0; $i < $this->_page; $i++) {

        
$kids .= (3 + 2 * $i) . ' 0 R ';

    }   

    
$this->_out($kids . ']');

    
$this->_out('/Count ' . $this->_page);

    /* Output the page size. */

    
$this->_out(sprintf('/MediaBox [0 0 %.2f %.2f]',

                        
$this->_w, $this->_h));

    $this->_out('>>');

    
$this->_out('endobj');

}


Let’s look at another method now:
_putStream(). We could have included
the code in the actual _putPages()
function, however, since this method is required for other objects (such as
images), we might as well separate it out now.


function _putStream($s)

{

    
$this->_out('stream');

    $this->_out($s);

    
$this->_out('endstream');

}


Whilst the content streams define the objects on a page,
sometimes they need to reference objects outside the content stream. These are
called resources. Resources are named objects such as font information or image
data. The following method includes any resources defined so far into the main
buffer. In this first part of the tutorial fonts are the only resources we will
be dealing with.


function _putResources()

{

    
/* Output any fonts. */

    
$this->_putFonts();

    /* Resources are always object number 2. */

    
$this->_offsets[2] = strlen($this->_buffer);

    $this->_out('2 0 obj');

    
$this->_out('<</ProcSet [/PDF /Text]');

    
$this->_out('/Font <<');

    foreach ($this->_fonts as $font) {

        
$this->_out('/F' . $font['i'] . ' ' . $font['n'] . ' 0 R');

    }

    
$this->_out('>>');

    
$this->_out('>>');

    
$this->_out('endobj');

}

This last private function, called in the above
_putResources() method, includes any
font names into the PDF file. As we are only covering core fonts in this
tutorial, nothing more than listing the font names is done here.


function _putFonts()

{

    
/* Print out font details. */

    
foreach ($this->_fonts as $k => $font) {

        $this->_newobj();

        
$this->_fonts[$k]['n'] = $this->_n;

        
$name = $font['name'];

        $this->_out('<</Type /Font');

        
$this->_out('/BaseFont /' . $name);

        
$this->_out('/Subtype /Type1');

        if ($name != 'Symbol' && $name != 'ZapfDingbats') {

            
$this->_out('/Encoding /WinAnsiEncoding');

        }

        
$this->_out('>>');

        $this->_out('endobj');

    }

}



Document output


The following function, the actual output of the document,
does nothing more than make sure the document is closed, send a few headers
according to browser type, and echo the buffered data.


function output($filename)

{

    if (
$this->_state < 3) {    // If document not yet closed

        
$this->close();         // close it now.

    }

    /* Make sure no content already sent. */

    
if (headers_sent()) {

        die(
'Unable to send PDF file, some data has already been output to browser.');

    }

    /* Offer file for download and do some browser checks

     * for correct download. */

    
$agent = trim($_SERVER['HTTP_USER_AGENT']);

    if ((preg_match('|MSIE ([0-9.]+)|', $agent, $version)) ||

        (
preg_match('|Internet Explorer/([0-9.]+)|', $agent, $version))) {

        header('Content-Type: application/x-msdownload');

        
Header('Content-Length: ' . strlen($this->_buffer));

        if (
$version == '5.5') {

            header('Content-Disposition: filename="' . $filename . '"');

        } else {

            
header('Content-Disposition: attachment; filename="' . $filename . '"');

        }

    } else {

        
Header('Content-Type: application/pdf');

        
Header('Content-Length: ' . strlen($this->_buffer));

        
Header('Content-disposition: attachment; filename=' . $filename);

    }

    echo
$this->_buffer;

}


The Script



The complete class

You can download the entire class for use with Part 1 of this tutorial.


Example Use




<?php

require 'PDF.php';                    // Require the lib.

$pdf = &PDF::factory('p', 'a4');      // Set up the pdf object.

$pdf->open();                         // Start the document.

$pdf->setCompression(true);           // Activate compression.

$pdf->addPage();                      // Start a page.

$pdf->setFont('Courier', '', 8);      // Set font to arial 8 pt.

$pdf->text(100, 100, 'First page');   // Text at x=100 and y=100.

$pdf->setFontSize(20);                // Set font size to 20 pt.

$pdf->text(100, 200, 'HELLO WORLD!'); // Text at x=100 and y=200.

$pdf->addPage();                      // Add a new page.

$pdf->setFont('Arial', 'BI', 12);     // Set font to arial bold italic 12 pt.

$pdf->text(100, 100, 'Second page');  // Text at x=100 and y=200.

$pdf->output('foo.pdf');              // Output the file named foo.pdf

?>

About the Author


Marko Djukic works and lives in Florence, Italy running his
own company http://oblo.com with the goal of
bringing innovative Open Source solutions to local government and SMEs. He is
also a core developer for the Horde Project (http://horde.org).

Marko can be reached directly at
marko@oblo.com

3 Responses to “PDF Generation Using Only PHP – Part 1”

  1. danmac Says:

    There is a problem with the lineWidth and the addPage. When addPage is called and a lineWidth was previously set you have it:

    if ($this->_line_width != 1) {
    $this->_out($this->_line_width);
    }

    However, _line_width is just an integer, and won’t output correctly
    ex:

    .8

    instead of:

    .80 w

    I changed mine to:

    if ($this->_line_width != 1) {
    $this->setLineWidth($this->_line_width);
    }

    since setLineWidth uses the _out function, outputting the line width correctly

  2. _____anonymous_____ Says:

    There is a problem with the lineWidth and the addPage. When addPage is called and a lineWidth was previously set you have it:

    if ($this->_line_width != 1) {
    $this->_out($this->_line_width);
    }

    However, _line_width is just an integer, and won’t output correctly
    ex:

    .8

    instead of:

    .80 w

    I changed mine to:

    if ($this->_line_width != 1) {
    $this->setLineWidth($this->_line_width);
    }

    since setLineWidth uses the _out function, outputting the line width correctly

  3. uweerakoon Says:

    http://www.zend.com/zend/tut/PDF1.zip url is redirect me to http://devzone.zend.com/public/view/tag/tutorials. i am confusing. i’m appreciating this article and it is greate i just want to try out the coding.