Categories


Loading feed
Loading feed
Loading feed

Automatic Tag Clouds for Every Site


Everybody is talking about the Web 2.0. Tag clouds are often found in the so called Web 2.0 sites. However, the features that follow the Web 2.0 spirit are only those that provide means to let the users contribute the content.

A tag cloud that complies with the Web 2.0 definition must be built from tags that the users associate to the content. That is what Tim O’Reilly called “folksonomy”, in opposition to “taxonomy”, which is the categorization of content defined by the publishers, not the site users.

Folksonomy is great because it drives user participation and makes users happy for the feeling of having a relevant participation to the site.

However, folksonomy may also be a source of new problems. For instance, how do you prevent that users with illicit interests tag the content with unrelated keywords in order spam the tag cloud generator and draw undue attention to other topics? In some cases taxonomy ends up being a safer solution.

But how can you build a tag cloud for a site that did not have any kind of content tagging or categorization? This is a problem that can be solved by the Automatic Keyword Generator class written by Ver Pangonilo from the Philippines.

The class can analyze content text and suggest keywords based on the frequency of expressions of one or multiple words.

Despite the suggested keywords may not be those that better describe the content, the class may still save a lot of manual classification work. A content moderator may fix the unsatisfactory cases, while benefiting from the work saved by all the rest of the cases on which the suggestions are appropriate.

The class usage is very simple. First you specify the text of the content you want to tag.

$data = 'The users are the main actors of the Web 2.0 . Without the users participation there are no Web 2.0 sites.';

Then you define some parameters to specify how do you want to analyze the text and extract its keywords.

$params=array(
 'content' => $data,
  //set the length of keywords you like
  'min_word_length'] => 5,  //minimum length of single words
  'min_word_occur' => 2,  //minimum occur of single words
  'min_2words_length' => 3,  //minimum length of words for 2 word phrases
  'min_2words_phrase_length' => 10, //minimum length of 2 word phrases
  'min_2words_phrase_occur' => 2, //minimum occur of 2 words phrase
  'min_3words_length' => 3,  //minimum length of words for 3 word phrases
  'min_3words_phrase_length' => 10, //minimum length of 3 word phrases
  'min_3words_phrase_occur' => 2, //minimum occur of 3 words phrase
);

Then you initialize the class with you parameters specifying the character set encoding of the text to classify.

include('class.autokeyword.php');
$keyword = new autokeyword($params, 'iso-8859-1');

Finally you tell the class generate your content keywords.

echo $keyword->get_keywords();

I am sure there is plenty of room for improvement in this class. Still its merits made the author win the PHP Programming Innovation Award edition of July 2006.

Comments