Using the Stack Exchange API with PHP (part 2)

      Comments Off on Using the Stack Exchange API with PHP (part 2)

Using the Stack Exchange API with PHP (part 1)

User Interface

In the first segment of this article, I introduced you to the Stack Exchange Web service API, which offers a way for developers to create custom Web applications that use content from Stack Overflow and its sister sites. In that article, I introduced you to the StackPHP package, and I also gave you a crash course in using the package to search for questions, retrieve answers and comments, and work with tags.

The thing to remember about questions, answers and comments, though, is that they don’t exist in a vacuum. They’re created by users, and it’s the users that make the site tick. That’s why the Stack Exchange API includes a large number of methods designed to let developers access user profiles and timelines, and unearth the relationships between users and their posts.

This article will focus primarily on this dimension of the Stack Exchange API, illustrating how to search for users, obtain user profiles and timelines, and retrieve information on a user’s questions, answers, comments, badges and tags. To wrap things up, I’ll also discuss a couple of simple ways in which you can improve your application’s performance with pagination and caching. So let’s get started!

The Who’s Who

Let’s start with something basic: getting a list of users. This is easily accomplished with the listUsers() method, which maps to the /users API endpoint. The results returned by the method can be filtered and sorted by specific attributes, such as account creation date, reputation or name. Here’s an example, which returns a list of users sorted by reputation:

The return value of the listUsers() method is an array of objects, each of which has properties corresponding to various user attributes. As the code above demonstrates, you can extract a great deal of profile information from each object, including the user’s name, user identifier, account creation date, location, question and answer counts, and free-form self-supplied biography.

Notice also that the service object in this case is not a Post_Exchange object (which was used extensively in the first part of this article) but a User_Exchange object. As the name suggests, this object provides a common repository of methods related to userspace information, and it will be used in almost all the examples in this article.

Here’s what the output of the script above might look like:

Searching For Godot

If you pass the listUsers() method a search string, it will return only those users whose names contain that string. This means that it’s quite easy to convert the previous listing into a simple search engine for users. Here’s the code:

This script creates a simple form for the user to enter a search string. Once the form is submitted, the user input is passed to the listUsers() method, which returns a list of users with names matching the search term. The ‘filter’ parameter is used to specify the search term, while the ‘sort’ and ‘order’ parameters specify that the result set should be sorted by reputation, with higher-reputation users first.

Here’s an example of the search results returned by this script:

Q & A

If what you’re really after is a list of questions, answers and comments posted by a user, the Stack Exchange API can give you that information too. These requirements are serviced by three methods in the StackPHP package, usersQuestions(), usersAnswers() and usersComments(), which respectively map to the /users/id/questions, /users/id/answers and /users/id/comments endpoints. Here’s a quick example that shows how this works:

If you paid attention to the first part of this tutorial, a good part of the script above will seem familiar to you. That’s because the objects returned by the usersQuestions(), usersAnswers() and usersComments() methods are very similar to those returned by the questions(), questionsAnswers() and questionsComments() discussed earlier. It’s now quite easy to iterate over these objects and display the title and content of each type of post, together with a link.

Here’s a sample of what you might see:

Talk To The Badge!

If you’ve used Stack Overflow, you may be familiar with the concept of “badges”. Very simply, they’re rewards for participation, and they’re automatically earned as you contribute more to the site. So, for example, if you vote 300 or more times, you’ll earn the “Civic Duty” badge or, if you create a new tag that is used by 50 or more questions, you’ll win the “Taxonomist” badge. It’s a fun way to measure contributions, and to learn more about other users in the community.

The API offers a /badges method, which returns a complete list of available badges together with a count of how many times they’ve been awarded. Here’s an example:

Here, the badges() object method invokes the /badges API endpoint and returns a list of badges as objects. This is what the output looks like:

Notice that each tag comes with a ‘tag_based’ attribute, which defines whether the badge is earned on the basis of contributions to a particular tag, or if it is earned on the basis of a particular activity. You can also obtain this information through two other methods, badgesTag() and badgesName(), which return arrays of tag-based and non-tag-based badges respectively.

It’s also possible to find out which users were awarded a particular badge with the badge() method, which accepts a particular badge ID and returns a list of all the users who were awarded that particular badge. Consider the next example, which illustrates this by retrieving the first five non-tag-based badges in the list, and then retrieving the names of users who have won those badges:

This script first gets a list of non-tag badges using the badgesName() method. It then iterates over the first five badges in this list, and for each badge, invokes the badge() method and passes it the badge ID as argument. The return value of the badge() method is a list of users who have been awarded that badge. This information is then formatted and displayed on the page, as shown below:

Playing Tag

Why am I talking about badges, you ask? Well, just as you can retrieve a user’s posts, so too can you retrieve the badges won by a user, and the tags that user has participated in. The Stack Exchange API offers the /users/id/badges and /users/id/tags methods to retrieve this information, and they’re reflected in the usersBadges() and usersTags() methods in the StackPHP package. Here’s how you can use them:

The usersBadges() method returns an array of objects representing the badges won by the user. As noted in the previous page, a badge can be either tag-based or non-tag-based; this distinction is noted in the ‘tag_based’ property. The previous script displays only the user’s non-tag-based badges, including the badge name, the reason for its award, and the number of times it was awarded.

The usersTags() method returns a list of all the tags that a user has participated in. Tags identify the key subject areas of a question; when linked with a user, they serve as an indication of the depth and breadth of that user’s activity. The return value of the usersTags() method is similar to that of the tags() method: a list of tags that the user has posted a question, answer or comment in, together with a count.

Here’s what the output looks like:

Time Flies When You’re Having Fun

One of the more interesting methods in the StackPHP package is the usersTimeline() method. This method returns a timeline of user activity within a defined period, and corresponds to the /users/id/timeline API endpoint. It accepts a user identifier and an array containing a list of filters, of which the most useful are the ‘fromdate’ and ‘todate’ filters.

The best way to explain how this method works is with an example:

The return value of the usersTimeline() object is a timestream of events generated by a particular user. These events are represented as elements of an array, as shown below:

As you can see, each element represents a specific event in the timestream. The primary attributes to be aware of here are as follows:

  • The ‘timeline_type’ key specifies the type of event. Possible values are ‘comment’, ‘askoranswered’, ‘badge’, ‘revision’, or ‘accepted’;
  • The ‘post_type’ key specifies whether the event took place on a question or an answer;
  • The ‘post_id’ key specifies the post identifier related to the event;
  • The ‘creation_date’ key specifies the date and time of the event;
  • The ‘description’ key specifies the title of the question on which the event occurred.

With all this information at hand, it’s possible to construct a timeline of user activity between a specified time period. Here’s an example:

Here’s an example of what the output might look like:

The Number Game

The Stack Exchange API also provides a stats() method, an interesting utility method to discover statistics about the site. The data supplied by this method includes tidbits as the total number of answered and unanswered questions, the total number of votes, the total number of badges issued, the number of questions posted per minute, the page views per day, and so on.

Here’s an example that illustrates how you can use this method:

Here’s a sample of the data returned:

You can wrap this data in some formatting to make it more readable:

Here’s the result:

Turning The Page

So far, all the examples you’ve seen have displayed a single page of data – for example, the first 20 users, or the first 5 answers to a question. However, if you’re building a Web application, you’ll usually want to allow your users to access the remaining pages of a result set as well. The easiest way to do this is by wrapping a pagination component, like PEAR Pager or Zend_Paginator, around the result set of a method call.

Most Stack Exchange API methods support additional parameters for pagination: the ‘pagesize’ parameter specifies the number of items per page of the result set, while the ‘page’ parameter specifies which page to retrieve.

Here’s an example of using these parameters in combination with PEAR’s Pager package to add pagination when displaying the list of unanswered questions:

This script begins by including the Pager class file, and setting the number of items per page and current page number from GET variables in the URL string. These values are then incorporated into the unanswered() method call to retrieve the corresponding “page” of data from the Stack Exchange API. Each call to the API also includes a count of the total number of matching records, this information is also extracted and stored in the $count variable, as it’s a critical input in determining the total number of pages.

An instance of the Pager class is then created via the class’ factory() method, and passed an associative array of configuration parameters. Here’s what each of them does:

  • The ‘mode’ parameter tells the Pager class whether page links should be generated in “jumping” or “sliding” mode.
  • The ‘perPage’ parameter indicates how many elements from the data set are to be displayed per page of data.
  • The ‘totalItems’ parameter specifies the total number of items in the data set.
  • The ‘delta’ parameter controls how many page numbers appear in the pagination bar.
  • The ‘itemData’ parameter specifies the data set to be paged.

With all of this information at hand, the getLinks() method is able to return an array of links to the first, last, previous and next pages. This getLinks() method is important because it automatically generates all the HTML for the page navigation links, as an array. To display these links in the view, you need simply output the contents of this array’s ‘all’ key.

Once the links are generated, the actual data set is passed to the Pager class via the ‘itemData’ parameter and the object is rebuilt. This two-step process is necessary because if we were to provide the data set in the first instance, the actual number of items in the data set would override the value of the ‘totalItems’ parameter, making it impossible to correctly generate the complete set of page links.

Here’s an example of the output:

If you prefer Zend_Paginator to PEAR Pager, it’s possible to replicate the above using Zend_Paginator’s Null adapter to create the pagination bar with appropriate links. Here’s what that version of the code would look like:

This script begins by setting up the Zend auto-loader, which takes care of automatically loading Zend Framework components as needed. It then initializes an instance of Zend_Paginator and passes the object constructor an instance of the Null adapter, which is itself initialized with the count returned by the service API call.

The current page and the number of items to be displayed per page can be set using the Zend_Paginator object’s setCurrentPageNumber() and setItemCountPerPage() methods. In the example above, these values are obtained from the request itself, through the $_GET array.

The getPages() method returns an object containing various bits of useful information: the total number of pages in the data set; the first, last, current, next and previous page numbers; the page numbers in the current page “window”; the total number of items in the data set; and the number of items per page. This information is very useful, because with just a few lines of code, it can be converted into a set of page links that allow the user to navigate through the data set. In the above example, a loop is used to create these links based on the information provided by the getPages() method.

Here’s a sample of the output:

Cache Cow

The StackPHP package also provides a built-in caching mechanism, which can come in handy for high-traffic Web applications. This caching subsystem requires the SQLite3 extension to be installed, and can be activated simply by specifying a cache timeout value (in minutes) as the third argument to the service object constructor. The cache directory will be created under the script’s current working directory, assuming the Web server has file creation permissions.

To illustrate, consider the next example, which revises the first script in this chapter to present a list of users with caching:

The first time you execute this script, the cache will be set, and you’ll see a message like this at the top of the output page:

All subsequent requests (until the cache expiry time) will be served from the cache, and will be notified with the following message at the top of the output page:

And that’s about all we have time for. As this two-part article has hopefully demonstrated, there’s a lot you can do with the Stack Exchange API, and it’s not just about retrieving questions and answers. A well-defined and flexible user API means that you can build detailed user profiles and timelines and also link users to their posts and contributions to the site. Add in built-in SQLite-based caching at the PHP library level and the ability to easily integrate with common pagination widgets, and you’ve got all the tools you need to build a solid, feature-rich mashup. Give it a shot sometime, and see what you think!

Copyright Melonfire, 2011. All rights reserved.