PHP Gains in the TIOBE Programming Community Index, and Takes 2nd in the JIPLFoG

The TIOBE Programming Community Index has been updated for April 2006 showing their view on the status of programming languages in the development community. PHP moves up a spot from 5th to 4th. The long-in-the-tooth C fades back a bit relinquishing 1st place to Java, while C++ gains on its brethren with PHP biting on its heels (a margin of 0.02 between their percentages). PHP has a higher climb rate so expect another big move in the coming year.

But then again, what does this mean? How does TIBOE decide this index? Looking at the definition of their process, they state:

The ratings are calculated by counting hits of the most popular search engines. The search query that is used is

+"<language> programming" -tv -channel

The search query is executed for the regular Google, MSN, and Yahoo! web search and the Google newsgroups for the last 12 months. The web site has been used to determine the most popular search engines. The word “tv” and “channel” have been filtered out here to avoid any interference with TV programs. Otherwise languages such as ABC and Scheme would have been highly overrated.

By applying the search engine query as defined above, a lot of hit counts are collected. Let’s define “hits(PL#i,SE)” as the number of hits of programming language PL at position i of the TPC index for search engine SE. The counted hits are normalized for each search engine for the first 50 languages. More formally, the rating for PL#i becomes

((hits(PL#i,SE1)/hits(PL#1) + ... + hits(PL#50)) + ... + (hits(PL#i,SEn)/hits(PL#1) + ... + hits(PL#50)))/n

where n is the number of search engines used.

After reading this, I decided to try a manual comparison myself. I ran a test using Google and the query string provided in the TIOBE explanation (+"<language> programming" -tv -channel). Here are the results:

Java 13,600,000 hits
PHP 12,800,000 hits
Basic 7,810,000 hits
C 5,760,000 hits
Perl 5,610,000 hits
C# 4,660,000 hits
C++ 4,610,000 hits
Python 1,120,000 hits
Ruby 236,000 hits
Cobol 192,000 hits

Hmmm. This does not feel quite right. I’m sure Cobol has a bigger web presence, so maybe the query of “Cobol Programming” as a phrase match is too limitting as that may not be how Cobol people talk? Maybe they say “Programming in Cobol” instead? Or “Cobol Portal for Programmers”. When removing the “Programming” part of the query, Cobol has 16,400,000 results. Wowzers. PHP jumps to 5,030,000,000. MEGA-WOWZERS! Let’s look at the more raw hits and see where they stand, then we can come up with a revised query. This is the hits based on the query +<language> -tv -channel just to keep it similar:

PHP 5,030,000,000 hits
Basic 1,550,000,000 hits (132,000,000 for “Visual Basic”)
Java 917,000,000 hits
Perl 328,000,000 hits
Python 231,000,000 hits
C++ 215,000,000 hits
Ruby 135,000,000 hits
C# 114,000,000 hits
Cobol 16,400,000 hits

C was dropped from the list as it is too hard to query as a single character with accurate results. We’ll add it back in later.

Now, the list changed a bit using that method. But, since some of these language names could be found out of context of an actual programming language, we need to find a better query. We need another significant word or two that limits the results to programming languages. Let’s just make a reasonable guess and go with that, since I can only type so many more queries before I give up on finding the truth in this matter (patience < truth). My query will now be +<language> +language +(programming OR program OR programmer) -tv -channel.

This still leaves the query for “C” being inaccurate since as a single letter it returns a lot of false positive matches. But then we have enough other languages to look at to satisfy my curiosity and we can just assume “C” is big and let it have a big number regardless of the false positives. For “Basic”, it also returns many false positives and sites for other languages, so the “Visual Basic” or “VB” descriptor is used instead (since we favored “C”, we’ll make up for it by punishing “Basic”).

Regardless of the problems (and accuracy), here are the results:

C 240,000,000 hits (suspect, but a better query would give me a headache)
PHP 127,000,000 hits
Java 114,000,000 hits
C++ 57,100,000 hits
Perl 46,000,000 hits
Python 32,100,000 hits
C# 27,500,000 hits
Visual Basic 25,000,000 hits (178,000,000 for “Basic”)
Ruby 15,900,000 hits
Cobol 6,880,000 hits

Alright! Cobol gets respect! That looks more realistic, but then again we could all come up with reasonable queries that differ and change the results.

Note: I make no guarantees of the accuracy of my survey or that it means anything at all. The “Jayson Index for Programming Languages Found on Google” (JIPLFoG) is not meant for use in investments or other financial transactions. Any loss you receive using this index shows that you should have read a different article today.

I’ll leave you now with the following exercise to continue this experience on your own time: Rank the languages by geekiness (+<language> +(geek or nerd) -tv -channel)

Digg This!