Creating a Text Version Web Page On-the-fly – Part 2

June 20, 2002

Tutorials


Intended Audience

Overview
Learning Objectives
Background Information
Prerequisites
Introduction to PDFlib

Getting Started
Creating the PDFlib Class
Page handling functions
Simplifying routine tasks
Putting It All Together
The complete script
Resources
About the Author



Intended Audience


This tutorial is intended for PHP developers who want to modify web pages on-the-fly,
and who are interested in Object Oriented Programming and using the PDFlib functions.

Overview


In Part I of this tutorial, you learned how to create a text version of a web
page from a static page. Part 2 extends the text conversion functionality to
include the ability to deliver a PDF version of the web page. By leveraging
the platform-independent printing capabilities built into the PDF technology,
you can deliver printable documents much more reliably than by using HTML or
CSS.

While this tutorial demonstrates the PDFlib functionality on our converted page,
the code could easily be adapted to create PDF documents from database content
or from form input fields.

Learning Objectives


In part two of this tutorial, you will learn how to:

  • Add functionality to the previous text conversion class
  • Use functions within the PDFlib library
  • Encapsulate PDFlib functions into its own class


Background Information


The first part of this tutorial showed how to create an accessible version of
a web page on the fly. This was done by reading the static page into a variable,
removing HTML tags using regular expressions, and echoing the output to the
browser. By converting pages in this manner, an entire site can be made accessible
through the creation of one script. Additional functionality can be added to
this script which would make it possible to create a PDF document on the fly,
so the page is more easily printed. This approach is much simpler and straightforward
than relying on the formatting and printing capabilities of things like style
sheets, which still lack full support from all the major browsers (including
IE 6.x and Netscape 6.x).

Prerequisites


PHP, OO Syntax, HTML, and basic PHP configuration.

Introduction to PDFlib

PDFlib is a library of C functions which create PDF documents either as static
files or in memory (which are then streamed directly to the browser). While
not a full-fledged PDF layout engine, it performs most tasks quite nicely. However,
because it is a library of C functions, creating a class that encapsulates the
functions makes it a lot easier to work with.

I should stress that there are dozens of functions available in PDFlib, and
this tutorial touches on only a few of them. Advanced PDFlib functions can create
just about any type of PDF document imaginable, and provide an excellent means
of satisfying the requirement that forms received over the web must look exactly
like forms filled out by hand. I highly encourage you to explore this incredibly
useful tool.

Getting Started


Like most 3rd party libraries, PDFlib must be compiled into PHP for the functions
to be available. Depending upon your OS, this process can be fairly involved.
Visit the PHP site for more
information
regarding this task.

In Part 1 of this tutorial, we learned to convert an existing web page into
an accessible version. To convert the accessible version into a PDF, we will
simply remove all HTML tags and use the line breaks as we write the PDF document.

Creating the PDFlib Class


There are two different ways to deliver a PDF document to the browser: either
as a file or stored in memory. Each method requires the PDF object to be opened
and closed in a different way. We will make our lives easier by creating a class
to manage these and other routine tasks. (The details of OOP in PHP are beyond
the scope of this tutorial, however, there are many useful resources available
on the subject, including Michael Johnson’s excellent
tutorial on Zend’s site
.)

Page handling functions

Our PDF class will set up several page variables as defaults when instantiated.
These defaults will work for most PDF documents. The class also handles the
PDF object itself, regardless of whether the PDF document is a static file or
created in memory.

<?php

class PDF {

    var 
$pdf;

    var $fn;

    var 
$buffer;

    var 
$author;

    var 
$title;

    var $creator;

    var 
$subject;

    var 
$page_close_flag;

    var 
$fonts;

    var $font_size;

    var 
$txt;

    var 
$x;

    var 
$y;

    function PDF ($author$title$creator$Ssbject$fn FALSE)

    {

        if (!function_exists("pdf_set_info")) {

             
/*
pdflib must not be compiled in */

             return FALSE;

        }

        $this->fn $fn;

        $this->author $author;

        
$this->title $title;

        
$this->creator $creator;

        $this->subject $subject;

        $this->pdf pdf_new();

        /*
If fn is true, PDF file will be created.  False will create the
PDF
         * document in memory */
        
if ($this->fn) {

            pdf_open_file($this->pdf$this->fn);

        } else {

            pdf_open_file($this->pdf);

        }

        /*
set up fonts for use later */

        $this->fonts = array (

            
"Courier",

            
"Courier-Bold",

            
"Courier-Oblique",

            "Courier-BoldOblique",

            
"Helvetica",

            
"Helvetica-Bold",

            
"Helvetica-Oblique",

            
"Helvetica-BoldOblique",

            "Symbol",

            
"Times-Roman",

            
"Times-Bold",

            
"Times-Italic",

            
"Times-BoldItalic",

            "ZapfDingbats"

        
);

        /*
These are only used once, when the document is created */
        
pdf_set_info($this->pdf"author",  $author);

        pdf_set_info($this->pdf"title",   $title);

        
pdf_set_info($this->pdf"creator"$creator);

        pdf_set_info($this->pdf"subject"$subject);

        /*
Set default page params */

        $this->p_width              612;

        
$this->p_height             792;

        
$this->margin_l             20;

        $this->margin_r             20;

        
$this->margin_top           40;

        
$this->margin_bottom        40;

        $this->font_size            11;

        
$this->font                 "Times-Roman";

        
$this->line_width_default   1.5;

        $this->default_line_spacing 1.5;

        $this->$this->margin_l;

        $this->$this->p_height $this->margin_top;

        
$this->print_width  $this->p_width $this->margin_l $this->margin_r;

        $this->print_height $this->p_height $this->margin_top $this->margin_bottom;

        
        
/*
Page_close_flag set to true means there is not currently a page
         * open */
        
$this->page_close_flag TRUE;

    } /* end constructor
*/
?>

While I could have the constructor start a new PDF page, I will do that
independently for more control. The member function start_page()
handles this. It also checks to see if there is a page currently open, and
if so, closes it.

<?php

    
function start_page($bookmark$width FALSE$height FALSE)

    {

        /*
There is a page still open, must close it first */
        
if (!$this->page_close_flag) {

            pdf_end_page($this->pdf);

        }

        /*
We can change the page width or height here by providing arguments
         * to this member function,
but this isn't typical */

        if ($width) {

            
$page_width $width;

        } else {

            $page_width $this->p_width;

        }

        if ($height) {

            $page_height $height;

        } else {
            
$page_height $this->p_height;

        }

        /*
Starts the page */
        
pdf_begin_page($this->pdf$page_width$page_height);

        /*
Add a bookmark, useful for the user */
        
pdf_add_bookmark($this->pdf$bookmark);

        /*
Reset the left and top margins */
        
$this->$this->margin_l;

        
$this->$this->p_height $this->margin_top;

        /*
Initialize font */
        
$this->set_font($this->font$this->font_size);

        
        
/*
False means there is a page open */
        
$this->page_close_flag FALSE;

    } /* end start_page
*/

?>

The member function close_pdf() will close any open pages,
close the PDF document, and then either write the file or stream the PDF document
to the browser, depending upon how the object was created.

<?php

    
function close_PDF()

    {
        if (!
$this->page_close_flag) {

            /*
Close any open page */
            
$this->page_close_flag TRUE;

            
pdf_end_page($this->pdf);

        }

        pdf_close($this->pdf);

        if (!$this->fn) { /*
Save buffer (if there is one) */

            $this->buffer pdf_get_buffer($this->pdf);

        }

        pdf_delete($this->pdf);

        /*
If created in memory, stream buffer to browser */
        
if ($this->buffer) {

            $len strlen($this->buffer);

            
header("Content-type:
application/pdf"
);
            
header("Content-Length: $len");

            header("Content-Disposition:
inline; filename=foo.pdf"
);
            echo 
$this->buffer;

        }

    } /* end close_PDF
*/

?>


Simplifying routine tasks

That’s it for the page handling functions. Next, I will add a few functions
to simplify the routine tasks, and we will be on our way.

<?php

    function 
set_font($font FALSE$size FALSE)

    {
        
/*
Sets font parameters; this member function will typically get used
         * a lot within a PDF document.  */

    
        /* See if requested font exists
*/

        $font_key array_search($font$this->fonts);

        if (!
$font_key) {

            $this->font "Times_Roman"/*
default font */
        
} else {

            
$this->font $this->fonts[$font_key];

        }

        /*
Change font size if supplied as an argument */
        
if (is_numeric($size)) {

            $this->font_size $size;

        }

        pdf_set_font($this->pdf$this->font$this->font_size"host");

    } /* end set_font
*/

    function set_txt_parms($txt$x FALSE$y FALSE$align FALSE)

    {
        
/*
Function to set $txt, $x, $y, and $align variables */
        
$this->txt $txt;

        if (
$x) {

            $this->$x;

        }
        if (
$y){

             $this->$y;

        }
    
    } 
/* end set_txt_parms
*/

    function pdf_print_txt($txt$x false$y false$align false)

    {
        
/*
Prints text to PDF document, creating new pages as required */
        
$this->set_txt_parms($txt$x$y$align);

        /*
Variable c holds the number of characters that couldn't be placed
         * in the text box. pdf_show_boxed
handles everything, (wrapping,
         * formatting, etc.) */
        
$c pdf_show_boxed (

            $this->pdf$this->txt$this->margin_l$this->margin_bottom,

            $this->print_width$this->print_height"justify"

        
);

        $pagenum 1;

        while ($c != 0) {

            
$pagenum++;

            
$this->txt substr($this->txt$c);

            $this->start_page("Page $pagenum");

            
$c pdf_show_boxed(

                
$this->pdf$this->txt$this->margin_l$this->margin_bottom

                $this->print_width$this->print_height"justify"

            
);

        }

        
    } 
/* end pdf_print_txt
*/
    
/* end class
*/
?>

The member function pdf_print_txt() takes our entire text string
and uses a PDFlib function called pdf_show_boxed(). If pdf_show_boxed()
is not able to print all of the text in the box specified (which is the same
as the page size), it returns the number of characters that were not printed.
We then enter a loop and keep printing pages until pdf_show_boxed()
returns zero.


Putting It All Together


To make everything work together, we start by getting the accessible page as
described in Part 1 of this tutorial (except that we strip ALL tags). One important
thing we must do is remove any special HTML characters remaining (“&nbsp;”,
etc.) or we could end up with some rather long strings, which the function pdf_show_boxed()
cannot handle.

<?php

    $obj = new HTMLtrans(parse_url($HTTP_GET_VARS['pageID']));

    /* We don't need the
HTML tags for the PDF though */

    $new_txt strip_tags($obj->modify_HTML($obj->html));

    $new_txt str_replace(

        array (">""<"""",
"
&", "&nbsp;",
"
&copy;"),

        array ( ">",   "<",    "\",     "&",     "",       " copyright "),

        $new_txt
    );

?>

Now I will instantiate the PDF class, feed it the text that was converted,
and have it create the PDF file.

<?php
    $obj 
= new PDF("Author""My
Title"
"Creator""My
Subject"
);

    $obj->start_page("Page
1"
);
    
$obj->pdf_print_txt($new_txt);

    
$obj->close_PDF();

?>

The converted page is then made into a PDF document and sent to the browser.
In order to tell your user that this is going to happen, a descriptive link
is a good idea.


<
a href="/pdfconversion.php?pageID=<?php

    echo urlencode("http://".
$_SERVER['HTTP_HOST'].  $_SERVER['REQUEST_URI']);
?>">convert to PDF</a>

Just include the text conversion class within the PDF conversion script,
and you are done.

The complete script


You can download a zip file with the complete scripts here.

Resources


PDFlib – http://www.pdflib.com
PDFlib Installation – http://www.php.net/manual/en/ref.pdf.php
Using Objects to Create an Application – /zend/tut/tutorial-johnson.php

About the Author

Jim Thome has been a design engineer and IT professional since graduating from
Colorado State University with a degree in Mechanical Engineering, and is currently
Web Technology Administrator for the City of Fort Collins, Colorado USA. Please
send any questions or comments to jthome@fcgov.com.

Comments are closed.