Categories


Loading feed
Loading feed
Loading feed

Your Friday Batch of Statistics


It has been said that statistics can be used to prove anything. Blog author jimbojw has taken that more as a challenge than as a statement has has released 2 posts on his blog detailing his “findings”.

jimbojw’s first post on the highly scientific subject of Languages that Suck was posted on 10/12/2006. In his purpose statment jimbojw states:

Everyone knows that programming languages suck, but which sucks the most? I have conducted a scientific study to answer this question, and humbly present my findings below.

He goes on from there to describe his methodology and present his findings. This is all well and good except that, as was pointed out by one of his commenters, his methodology was flawed. His findings assumed all languages were equal and therefore complaints about any given language could be considered equal. This, is a false assumption. Given that PHP, Perl and Java all have larger code-bases in the wild, you would naturally expect the phrase “sucks” to appear more often for them. This has to be factored into a ratio in which the number of files for a given language are indexed against the number of times the word “sucks” appears in the code. (it is assumed that the word “sucks” appears only in th ecomments as it’s not a keyword in any of the languages studied and as far as I can tell, no vacuum cleaner systems are in the indexed codebase) Therefore, even though PHP enjoyed a low position of #8 in the top 10 (#1 being the suckiest language) the studies results cannot be relied upon.

Recognizing this and being dedicated to his craft, jimbojw dove back into the numbers. He refined his methodology further and came up with a second and much more detailed study he titled More Languages that Suck. From this work, which is bound to be picked up by one of the scientific journals who cover the field of suckieness, comes a whole new set of metrics. Here jimbojw described his methodology for this latest study.

As in the first study, all data were collected from search results retrieved via Google’s Code Search. For each target language, three pieces of information were initially gathered:

Total Files
An approximation of the language’s footprint in Google’s database (and thus its popularity). Determined by one of the following queries: lang:<language-name>, lang:”<language-name>”, or file:.*\.ext where ext is the file extension of that language’s source code files.

Hacks
Measure of a languages hackiness. Determined by one of the following: lang:<language-name> hack or lang:”<language-name>” hack
hack

Sucks
Measure of a languages suckiness. Determined by one of the following:lang:<language-name> sucks or lang:”<language-name>” sucks


When choosing between two queries, the one with the larger number of hits is kept. For example, there are approximately 4.4 million hits for lang:c, and 4.53 million hits for lang:”c”. In this case, the latter number is retained.

Obviously this was a much more detailed effort. The outcome of that effort that plays to our (the PHP community) favor in that PHP actually dropped a slot from #8 to #9 in the overall standings. Concerned Visitors have already started posting new flaws in his methodology and discussing the project as a whole.

I’ll leave it to you to go out and analyze his findings and make them say what you want them to. I encourage to read the results carefully, analyze them, scrutinize them and generally blow 30 minutes or so starring at what, from a distance, could be construed as work. If you are really ambitious, quote this study in an email to upper management as proof that your company really needs to abandon all other efforts and concentrate on PHP. However, a word of warning, if they don’t understand that it’ a joke, you need to start looking for a new job.

=C=

Comments


Loading feed