Consuming Webthumb’s API in PHP

      Comments Off on Consuming Webthumb’s API in PHP

Joshua Eichorn, author of Understanding AJAX, has written a cool web service called “Webthumb. If you are not familiar with WebThumb, you can visit the web home Joshua has setup for it at, http://bluga.net/webthumb/.

It’s a simple service on the surface. You hand it a URL, it gives you back a thumbnail of that web site. At first glance you may think…”ok, why?” The answer to that is of course, “it depends”.

  • Graphic
    Designers can use it to create thumbnails of their work for archival
    purposes.
  • You
    may want to create a history of a web page to show it’s progress.
  • Since
    Amazon is now charging for Alexa’s thumbnail service, you may want to
    create thumbnails of sites you are talking about on your blog.

The list of good reasons to use such a service goes on.
Let’s pretend you are convinced that this is the service you’ve been looking
for to create your latest mashup and move on.

The Code:

Let’s get this out of the way shall we? Some of you are interested in the article but there is a group of you (you know who you are) who just want the code. So here you go. Here is the link to the zip file and the link to the tar ball containing the class and a sample usage file. Note:To use it you will have to have a WebThumb API Key. (it’s free, go get one)

The Overview:

The API Joshua created uses custom XML payloads to
communicate between the client and the server. The service can be accessed via
a web page from Joshua’s server by pointing a browser at http://bluga.net/webthumb/. This will give you a feel for what is capable.
If you watch that page for any length of time, you will see new thumbnails
appear thanks to the AJAXian goodness of the page Joshua has coded into the
page.

When using the page to access the service, it asks you for 3
properties. Let’s discuss them now because they will be important later.

  1. URL
    This is the URL you want to thumbnail.
  2. Height
    This is not the height of the image as you might think, it is the height
    of the browser taking when taking the thumbnail.
  3. Width.
    This is the width of the browser when taking the thumbnail, again it has
    nothing to do with the size of the actual image generated.

Go ahead, try one out so you can see what it does, I’ll
wait. There, finished? Ok, let’s move on.

Now that we understand what the service does, let’s look at
how we communicate with it. There are 4 messages you can POST to the server.

  • Request
    a thumbnail
  • Check
    the status of a job
  • Fetch
    a thumbnail
  • Check
    the status of all recent jobs

Each of them is an XML payload that you send to the server
and in return you get XML back. The syntax of the payloads is very simple so
this is a good ‘starter’ API to work with. In this article we will only be
dealing with the first 3 messages. Since they are all very similar, I’ll give
you and example of the Request message and point you to the docs at http://bluga.net/webthumb/api.txt if
you want to see the rest.

Request:

You can have multiple requests within a message so:

These would all be valid messages to send to the
server. There are additional parameters
that can be passed in via the request message. If you wan to, you can specify
the height and width of the browser window to be used to take the snapshot. In
the above example, we pass in the height and width only on the third URL.

In response to the above message, the server would send back
something that looks like this:

This gives you an idea of what you need to send and what you
get back. The only major deviation from this pattern is the response to the
Fetch message. That returns an image so you have to deal with it differently.

Oh, before you go much further, you may want to go register
for a webthumb account. This will give you the API key that you’ll need to run
any of the examples.

The Class:

Ok, so we’ve delved into the goodness that is the API. But
who really wants to deal with that on a regular basis? Really, all you want to
do is get a thumbnail to use in your project. Ok, so let’s build a wrapper
class around the service to make that dead simple for us. First a quick word of
warning. This class is PHP 5 only. Sorry to you PHP 4 users out there but I
like PHP 5’s OOP much better so everything I write these days is in it. Now, on
with the code.

As luck would have it I spent a good amount of time writing the class we will need. I’m
going to dissect the interesting pieces here for you.

This class does three things. It requests thumbnails for
URLS. It checks the status of those requests and it grabs the images and saves
them to your hard disk. It could be extended to do move but for the purposes of
this demonstration, that’s enough.

To request a thumbnail, you do this:

The first two lines are setup and need no explanation. As a side note, since several of the functions in webthumb throw exceptions, it’s a good idea to wrap your code in a try/catch construct. Also, I’ve ruined any possibility of cut-and-paste by adding line numbers to important lines. Bear with me, I’ll give you the URL to the entire sample before I’m done. Let’s break down the important lines.

Line 2. Again, none of this will work if you don’t get an API KEY. They are free so there’s no reason not to get one.

Line 3. This is where we add the URL. As you can see from here, we are utilizing all the optional parameters.

Line 4. This tells the object that we are finished adding URLS and it should go request the images.

Line 5. Once we have submitted the URLs we sit and wait.

Line 6. When the image is ready, we download it and create a file.

Line 7. Finally, since this is a demo, we do something silly with it like display it.

Great! Wonderful! But what is happening behind the scenes? There’s a lot more going on back there than we have described here. First let me make a couple of notes. To make this run on as many servers as possible, I steered clear of curl. (curl would have made this much easier) I know that curl is a standard library these days but there are still servers out there without it. To make the magic work, I stuck with the generic fsockopen() which should work on most any server with PHP 5 installed.

Ok, let’s dive into the code and swim around a bit, shall we? First let’s take a look at how we add URLs into the list to image.

To do this we call addURL. It adds the URL to the list and records any parameters we pass in in a nested array. The parameters are pretty self explanatory. As you can see we set reasonable defaults to the parameters so that everything but the URL is optional. We also initialize the nested array and go ahead and setup any rows in it we will need. It’s pretty boring stuff for the most part. The real fun comes when we make the call to submitRequests().

submitRequests() fires off a chain of methods designed to build the message and then send it to the server. It uses several protected methods to do this. First, as always, we do a parameter check; we throw an exception if the array of requests is empty. There is no sense in going on if there are no requests to be made.

Next we call _buildSubmitRequestXml().

This builds the XML message for the requests to be
made. To built the message the method spins
through the array and for each array element whose status is ‘New’, it adds a
request to the output, concatenating them into one big string. At this point I need to say that yes, I could
have used SimpleXML to build the XML and export it but honestly, because these
messages are so simple, it was just easier to do it old-school and concatenate
strings.

Once we have the message to send to the server, we check it. If it’s empty then that means that there were no requests in the New status to be submitted so we throw an Exception. (We throw an exception because we can’t throw a brick at whomever called submitRequests() when there really were no requests to submit)

Ok, we have our message and it is ok, let’s do the final preparations. Right now what we have is a properly formatted XML message. But we have to send it to a web server and web servers don’t speak XML. We need to make into an HTTP message that the web server can understand. To do this we call _preparePayload(). We hand it our XML and it hands us back a properly formatted HTTP payload.

Once we have our payload, the only thing left to do is submit it. _transmitRequest() takes care of all socket operations for us.

Once it transmits the payload to the server, hangs around to collect everything the server sends back. In the case of the submitRequests(), the response is an XML message that we will need to parse. In other instances it is the actual image data.

In all cases however, it’s not just the XML data, the HTTP headers are still there because we are working with fsockopen and not curl. (curl has an option to strip off the headers) So, let’s check and then strip the headers and then parse the XML. To do this we call the method _parseSubmitResponse().

_parseSubmitResponse() takes our response and does a couple of things to it. First it checks the content type that was returned by calling checkContentType().

For some reason, if the server has a problem with the XML message it returns an error as HTML, not as XML. So checkContentType() throws an exception if the Content-Type is anything but XML.

Next, parseSubmitResponse() checks to see if there is a DOCTYPE line. If there is that means that there is an HTML message in with out XML. Since we don’t care about it, we trim it off and throw it away. This is really more of a precaution than a necessary step. However, there were times when the server returned a valid XML message and still returned HTML so the parser makes sure to strip this out.

Finally, we split the string into 2 parts, the HTTP headers and the XML message. We throw the former away and prepare to process the latter.

Now that we have XML and just XML, let’s break it apart. For this we use one of my favorite new tools in PHP 5, SimpleXML. It’s just cool. We hand our message to SimpleXML and use foreach to extract the important pieces. It’s so much easier than trying to parse the string manually. Each job have a couple of pieces of information we want to track. One thing though that I need to mention. The array of images is indexed on the URL. Make sure when you use addUrl() that you specify the full url including “http://”. The reason is that when the message comes back, the URL has been altered to include it. It would have been nice if we could have passed in our own job id into the request that we could have used as an index.

One note about SimpleXM,. since we are using SimpleXML to parse the XML, everything is an object. Because of this we end up doing a lot of casting to get the variable types we want. It’s not difficult, it’s just important to know.

Once we have updated array of images with the information that comes back the first part is done. Now we wait. There are several ways we could d
this but I chose the most straightforward. We use a loop with a sleep in it. We check to see all the requested images have completed. If not, we go to sleep for a pre-determined amount of time and then we check again. It usually only takes once, twice at the most for all the images to finish.

To check whether the jobs are ready or not, we poll the server at regular but not often intervals. The poll is done by calling checkJobStatus()

As you can see from the code, checking the status of the outstanding jobs follows a similar pattern of Build Message, Build Payload, Transmit Payload, and Parse Response. We won’t rehash it all here but if you are curious, you can look at the code. One difference in checkJobStatus() is that it doesn’t throw an exception if the XML message is empty. It just silently returns.

Once readyToDownload() returns true we are ready to fetch the images. We can do this one of two ways. If you need fine-grained control over the process you can loop through the array yourself and call fetchToFile() for each one. If you like the defaults and just want to get it over with, call
fetchAll(). Since all fetchAll()does is perform the loop for you and calls fetchToFile().fetchToFile() does all the work so let’s look at it.

As the code above shows, fetchToFile() takes 3 parameters

  • the job
    to fetch
  • the
    filename to save it as
  • what
    size image to fetch.

It also uses the baseDir you set as part of the filename. If you don’t specify a filename, it will use the job id as the filename. Like before, it uses the same basic pattern. The difference here is the response is actual image data. Instead of parsing it, the method just strips off the headers and saves it to a file.

That’s it. With this wrapper you can easily grab thumbnails of URLs to use…well, for just about anything. You can download a zip file of the code from my web site. It includes a simple example.

UPDATE: Since I wrote this article, Joshua has updated the API. It now provides callbacks when an image is ready to download. I’ll incorporate those in version 1.1 of this class and post here when it’s ready.

I hope you find the code and this article useful. Don’t hesitate to email me if you have questions.

=C=

digg this article