Zend Weekly Summaries Issue #240

      Comments Off on Zend Weekly Summaries Issue #240

TLK: PHP 5.1
TLK: PHP 5.1 (goto – full report)
TLK: PHP 5.1 (gd)
TLK: PHP 5.1 (recursion fixes?)
TLK: PHP 5.1 (ifsetor)
TLK: PHP 5.1 (awk)
TLK: PHP 5.1 (back on track)
TLK: PHP 5.1 (ifsetor revisited)
NEW: PHP 4_4 branch
FIX: Abstract private methods
TLK: PECL and the win32 build
TLK: Catalan translation team
TLK: MySQL socket
CVS: PHP 5.1 (beta 1)
PAT: dl and shutdown

TLK: PHP 5.1

The whole of this week’s inbox appears to have been swallowed up by the huge PHP
5.1 thread, to the extent that Andrei Zmievski suggested sending all messages with
the subject line ‘PHP 5.1’ directly to dev/null. A number of sub-threads rose
and fell, leading to an almost complete break with tradition in this week’s Zend
summary. Be grateful for small mercies – the alternative was to have a single, solid
block of prose summarizing this extremely busy (important?) week.

Having said that, this time around there’s a full report covering the
arguments for and against goto in PHP, and this is a single,
solid block of prose – one that takes up nearly as much space as a standard full
week. However, it’s a single, solid block of prose that will simply be linked to the
next time this subject arises, full stop; this is not the sort of thing anyone would
want to write (or read) twice.

Short version: Please, please don’t hitch a ride on existing threads in
future…

TLK: PHP 5.1 (goto – full report)

The mention of goto (much feared in this department) was guaranteed
to at least treble the volume of internals list mail. Blame young Magnus Määttä,
whose innocent-seeming two-line email at the end of last week started it all. In
response to Jani Taskinen’s request, he also sent a link to Sara Golemon’s existing
patch, noting that he wasn’t certain whether it was the latest version. It wasn’t –
Sara subsequently rewrote her patch and quietly mailed the new version to Andi Gutmans for review.

Ilia Alshanetsky, Derick Rethans, George Schlossnagle, Andrey Hristov, Stefan
Esser and Lukas Smith were quick to announce their support. Wez Furlong, Marcus
Börger and John Coggeshall registered a more cautious vote ‘for the “limited”
version that Sara and Andi already worked out last time this came up
‘.

Sascha Schumann felt there was potential damage in adding ‘another horrid
language misfeature
‘ to PHP, and voted against it.

Olivier Hill was ‘pro’, but asked what the consensus had been on the PHP
goto‘s limitations: ‘Should it be able to jump outside a control
block? Can it jump anywhere?
‘ Mike Robinson, voting against, felt that these
kinds of questions and issues exposed the ugliness of the feature.

Ondrej Ivanic, also voting against an unlimited goto, expressed his
opinion that it would be cleaner to extend break and
continue to accept a label in local scope. Stanislav Malyshev agreed
with Ondrej, adding that ‘goto in/out of the control block in an invitation to
hell, since it would mess up a lot of assumptions and can lead to
crashes/leaks
‘. Derick pointed out that the implementation under discussion
would only be able to jump to a static label within the current scope. He added that
it would be impossible to clean up at the end of the function using a parameterized
break/continue, unless the function was wrapped in a
control block – an ugly solution. Stas retorted that the finally
construct should be employed in this situation, and wanted to know how such a
limited goto would cope with an error condition from a non-current
block, or with an exception being thrown. Allowing goto to jump outside
the current block would cause problems with cleaning up variables, and would still
not resolve the exception problem.

Ilia took issue with this, saying that the most common solution for emulating
goto in PHP is recursion, and that this in itself was problematic
currently. Lukas backed Ilia, bringing to the debate his own opinion that exceptions
as they currently are implemented in PHP pose a much greater risk of spaghetti code
than goto ever could, precisely because exceptions are able to
bubble out of local scope. The idea of goto in PHP was justified by the
fact that you wouldn’t need to create an object ‘just to be able to ‘goto’‘,
along with the scope limitation. Derick echoed this sentiment, adding that
goto could be used to emulate exceptions where there should be none,
i.e. in procedural code. Sebastian Nohn rather flippantly suggested that PHP should
also have a comefrom to make this emulation feature complete.

Responding to Ilia’s (and others’) concerns over exceptions being used in
procedural code, Stas voiced his own concern that goto would be abused
in all kinds of “creative” ways, making a mess of the code. Ilia felt that one
problem with exceptions was that, once one was thrown, it wasn’t possible to return
to code within the exception block. Using goto to do that, wrote Stas,
was exactly the kind of abuse that worried him:

Christian Schneider tossed his -1 vote into the fray, asking rather fretfully,
What happens if you goto into a loop?

Lukas reckoned that, since static labels were part of the offered implementation,
it would be just as simple to catch a looping goto as to catch infinite
recursion. Beside which, the kinds of horrid things that might be done with
goto couldn’t possibly be harder to debug than exception abuses already
are.

Magnus said he’d had to rewrite code that contained a lot of
goto-emulating nested continue/break loops on
many occasions simply because they were so hard to follow, and using recursive
functions was not an option; it would crash PHP. Exceptions would be even harder to
maintain than continue/break. Stas didn’t understand why
exceptions, with their clear usage structure, should be harder to maintain than
goto, which has no safeguards at all. ‘Because not everybody and
their mother uses OO
‘, answered Derick, and Lukas reiterated the limited nature
of the implementation under discussion. Magnus wasn’t against exceptions, but failed
to see how they could replace the goto emulation he was currently
forced to use. He was seriously considering rewriting his code in C just to make it
maintainable. As to user confusion, ‘Just add a big fat red warning in the manual
or something that goto is most likely not something the everyday user
needs, but at least let us who know how to use it, use it’
.

George was bemused by the notion that exceptions only belong in OO code, saying
that exceptions are distinct from OOP in many languages.

Ilia pointed out that any feature could be, and probably would be, abused.
He felt strongly that this shouldn’t prevent the addition of a useful tool.

Dmitry Stogov, a rare voice on the list, came in at this point to register his
negative vote. He felt that goto did not belong in a structured
scripting language, and that the addition of it would bring a lot of
misunderstanding and special cases, for example:


function
foo
() {
    ...
    
goto
LABEL
;
    foreach(
$a
as $v) {
        ...
        
LABEL:
        ...
    }
}

Stas pointed out to Lukas that the proposed limitations were not enough to
prevent goto jumping in and out of various control blocks within the
current scope. Exceptions would allow you to go either to the end of the function or
to the end of the try block, but goto would allow you to jump into
random blocks of code that may make assumptions about the environment. The only way
to avoid trouble would be to severely limit goto, to the point where it
would be more like break in Perl.

Lukas argued that it would always be easier to find a single-scope
goto label than a catch block that would eventually handle an emulated
goto. Also, he added, ‘by now our “goto is bad”
professor must have died from a heart attack from the suggestion that exceptions
should be used to emulate goto for non-error cases
‘. Stas replied
that his concern wasn’t over searching for labels; it was more about whether
goto jumped to places where it shouldn’t logically be. Besides, he had
never advocated using exceptions to emulate goto

It became very quiet for a while, as everyone went back and re-read the
thread.

Evidently a fast reader, Magnus re-entered the battlefield with ‘Not that I
see how exceptions came into the discussion, but anyway..
.’ and shared with us
all a heart-rending example of his life without goto.

Rasmus Lerdorf weighed in at this point to state that he felt quite ambivalent
about adding the feature at all:

He gave a couple of examples that left Andrey revising his concept of
‘limited’.

John Coggeshall gave a potentially award-winning example of horrible PHP code,
begging the question, ‘Since when did anyone care that we were giving users
enough rope to hang themselves with?
‘ Stas replied, ‘Giving rope is one
thing, giving a pack of explosives is another
‘. He suspected that most uses of
goto would by definition be abuse; the only legitimate usage he could
see would be to exit from a control block without counting brackets, which would in
his opinion be better solved by labeled break/continue.
Were there other legitimate uses in PHP?

Jared Williams suggested that one such use might be in code generation tools;
Stas felt that the tool should adapt to the language rather than the other way
around. Derick pointed out sharply that every parser generator he knew of did in
fact use goto. Stas pointed out equally sharply that Java doesn’t even
have it. Robert Cummings backed Derick, saying that goto was generally
much faster in parsing than using other constructs to emulate it could be. PHP user
Nelson Menezes backed Stas, saying that PHP was not the right language for building
a parser and pleading against goto: ‘let’s not introduce things that
we KNOW will generate problems
‘.

There was an ugly moment when Derick did the email equivalent of taking Nelson
outside.

Stefan Esser suggested that, since the addition of goto would
neither decrease the security of PHP scripts nor affect PHP’s reputation, it was
probably more urgent to remove the ability to work with files and access databases,
on the grounds that these features could be abused.

Stas asked again whether there were any legitimate uses for the language
structure in PHP beyond exiting control blocks. In Perl he could only see a need for
the goto& construct – not really goto at all. In C, it
was usually used as a replacement for multilevel
break/continue loops, in lexical parsers, and in the rare
cases where performance was a priority, such as operating system kernels. Ilia
promptly pointed out that goto is found many parts of PHP’s source
code. Edin Kadribasic backed him with a reference to sapi/cli/php_cli.c, but
Stas felt that the same issue solved there by goto could equally well
have been solved with a labeled break, which he’d be quite happy to see
in PHP. Andrey asked, did C suffer through having goto? Stas, on a roll
by now, pointed out that what suits C won’t necessarily suit PHP: ‘Does C suffer
from being able to freely convert any type to any and access any memory location?
Should we add these features too?
‘ George chipped in to say that, funnily
enough, he and Andi had been talking about arbitrary class casting over dinner just
the other night… and Andrey pointed out that we already have free type conversion
in PHP.

Nelson bravely poked his head around the door to point out that Perl and C don’t
get used by coding newbies, whereas PHP does:

Stas reiterated that some constructs were unsafe to jump in. For example, jumping
in to foreach might result in a crash, and jumping out of it result in
memory leaks. George mentioned that the Perl implementation specifically forbids
jumping into control structures that require initialization, of which
foreach is one. Stas was quick to say that jumping out was also
problematic; Derick pointed out that the memory leaks created by jumping out didn’t
pose a real problem, as the Zend Engine cleans them up at the end of the request
anyway. George added that this was also a problem inherent in labeled
break. Stas refuted this, on the grounds that goto could
jump to a random place and not just to the end of the block; it wasn’t possible to
have an opcode take care of the memory leaks at a random place. Sara finally broke
her self-imposed silence on the matter to argue that this was indeed possible:

George added that state machines often make extensive use of goto to
avoid recursive calls. Sascha demonstrated why this wasn’t necessary, but George
pointed out that none of the features being discussed for PHP these days are
essential to the language: ‘We are far past a minimal set of language primitives
in PHP
‘.

Greg Beaver of PEAR stated that parsers written in PHP would be noticeably faster
if they used goto, and he didn’t see this as an insignificant issue.
A simple example in the manual showing proper usage of
break/continue and warning to only use goto
as a last resort would be sufficient for discouraging newbies from shooting their
feet off.

Petar Nedyalkov noted that the objections were unanimously usage-based;
restricting the use to a static label in the current scope should be fine. After
all, nobody had to use the thing…

Ilia made up a short list of situations where goto might be required
in PHP; to speed up a state machine parser, to control code flow in error handling,
and to provide a faster and safer alternative to recursive loops, e.g. in tree
drawing algorithms. That last item provoked a ‘Yuck‘ from Stas, which
promptly drew fire from a few other quarters. Jason Garber sent in a thoughtful
email aiming to clarify – and remove the heat from – the arguments with a list of
feature evaluation points:

Unfortunately he also wrote the line, ‘Spaghetti code comes from an
inexperienced or sloppy developer, not from cool language features’
and the heat
was retained, with both David Zülke and Petar immediately backing that
sentiment.

Zeev Suraski argued, with a far deeper insight than many into the politics of
language, that ‘obscure constructs encourage obscure code‘.
goto, in his opinion, might bring more harm than good; the only real
example of its usefulness that had been mentioned here was in scanners or parsers,
which are a negligible portion of PHP usage and can be written more readily in
languages optimized for such tasks. It didn’t make sense to spend time on such a
feature in a language designed for web scripting.

The debate continued, without anything new being said, until Andrei finally
snapped:

Short version: We’re going to add goto some time when
you’re not looking, but we’re not going to document it. Ever.

TLK: PHP 5.1 (gd)

David Zülke wondered whatever happened to PIMP? Would it be bundled with PHP
5.1?

Pierre replied that making PIMP a PHP extension was not his top priority right
now, but it should be available in its own repository later in the year. In the
meantime, he was working to synchronize the Boutell GD repository with the bundled
PHP GD version. I asked whether this means we’re going to see animated gif support
(available in the new GD libs), but had no response.

Short version: PIMP isn’t on the agenda.

TLK: PHP 5.1 (recursion fixes?)

Springing from the Great goto Debate was the revelation (in Stas’
eyes at least) that recursive functions can result in PHP crashes. Stas wanted to
know where Ilia might have seen non-endless recursions that resulted in this. Ilia
explained that any recursive function actually takes only 13087 iterations to
force a crash, and gave a simple example:


<?php

$i = 0;

function a() {
    global
$i;
    echo ++
$i."
"
;
    
a();
}

a();

?>

A more complex function, said Ilia, could take as few as 1000 iterations to crash
PHP.

Stas said that in that case PHP should be fixed to use less stack allocations,
but argued that there couldn’t be many cases where program logic would require
recursion 1000 levels deep.

Short version: Recursion crashes have been brought to the attention of
someone likely to be able to fix them.

TLK: PHP 5.1 (ifsetor)

Lukas was prompt to vote for ifsetor(), and Dmitry just as prompt to
vote against it. From his standpoint, it didn’t make sense to implement simple
functions on the Engine level; if ifsetor() was added, we might as well
have strlen(), strpos() and everything else in
ext/standard as opcodes.

David Zulke chose a bad moment to vote ‘+1 for goto, and +712636
for ifsetor()
‘.

Zeev, as part of his ‘obscure constructs encourage obscure code’ speech, pointed
out that one of the key success factors of PHP was its minimal obscurity.
Unnecessary constructs could potentially damage the language. ifsetor()
is obscure, and 100% redundant; he couldn’t think of a single good reason to add it.
He went on to cancel out David’s vote completely with ‘-inf on
ifsetor(), -1 on goto
‘.

You’d think that might be the end of it, but nope; Noah Botimer started a new
thread (yay!) on the same subject (bleh), suggesting a specific behaviour and a name
change from ifsetor() to coalesce().
coalesce() would return the first non-null parameter, or
NULL if all parameters were null. Benj Carson wrote in to back him, and
Ron Korving liked the idea because coalesce() would handle any
number of variables’
. Ron was running on both threads at this point; he wrote to
the first thread asking if anyone would be interested in a parameter for
ifsetor() that treats isset() as !empty().
Taco van den Broek wanted it to check for != 0 too, but spotted the
intrinsic problem here; everyone agreed that they wanted a new construct, but nobody
agreed as to how the new construct should behave.

Sara felt that the emptiness concept should be left out of the picture, because
it can be done in userspace:


function
firstNotEmpty
() {
    
$vars
= func_get_args();
    foreach(
$vars
as $var) if
(!empty(
$var)) return $var;
    return
NULL;
}

She added pointedly, ‘There’s enough….contention over the undeniably useful
and not-implementable-in-userspace parts of this thread that it’s not worth muddling
it up with things that are a simple matter to do in userspace
‘.

Short version: We haven’t heard the last of ifsetor().

TLK: PHP 5.1 (awk)

Marcus wrote in to say that he’d forgotten to mention something important on his
initial 5.1 TODO list (remember that?). The problem was that the current
implementation of extension dependency relies on gnu-awk; anything else would
generate an immediately segfaulting PHP binary. Either gawk needed to be checked for
during configure, or extension dependency needed to be rewritten via tables in the
module struct so that a dependent extension wouldn’t be initialized before the
extension it relies upon.

Wez reported that he and Magnus had spent some time testing different awk
implementations, with very little feedback from anyone else in the PHP community. As
far as he’d been able to ascertain under those circumstances, the problem was only
present when mawk was installed under the name awk; the dependency script works fine
on Solaris, which doesn’t even have gawk installed. The best thing, said Wez, would
be to have someone with mawk tweak the script so that it worked for them.

Short version: Build expert(s) with mawk installed needed for build
testing.

TLK: PHP 5.1 (back on track)

Andi finally put an end to most of the discussion(s) by re-stating the main
objectives: getting PDO, the new execution engine and other changes already in CVS
HEAD out to the public, and folding the Unicode work into public CVS before it
became a maintenance nightmare. He planned to roll PHP 5.1 beta 1 the next day, with
the aim of getting public feedback as soon as possible. It would be best to aim for
PHP 5.1 release candidature at the beginning of July, close the 5_0 branch and merge
Unicode into CVS HEAD. This would provide the possibility of a quick development
cycle when it came to upgrading functions and extensions for Unicode support, and
the minor features currently under discussion could be addressed during that cycle.
There wasn’t anything to prevent the Unicode version (name still undecided) from
being released during 2005.

It would be to the benefit of all to make progress at this point. Besides, he
added in a postscript, ‘it seems like 100 people have had 101 different ideas for
ifsetor()
‘, which to him proved it belonged in userspace. He didn’t
object to a limited goto implementation being in the Unicode version –
but it wasn’t high priority, in his view.

Andrei, George and Rasmus were quick to back Andi over the need to drop the
discussions and push on with the 5.1 release. Rasmus added, ‘I am pretty sure the
current discussions will pale in comparison to the chaos that will be created when
the Unicode stuff goes into HEAD!

Derick agreed to add the first fruits of his datetime work to CVS HEAD ‘today’,
and Pierre added that he’d need time to update pecl/date following Derick’s
changes. He wondered why the beta had apparently been brought forward, as it left
them both very short of time… Andreas Korthaus agreed that PHP 5.1 should be
released as soon as possible, but wondered whether pecl_http would be
included, saying it was a secure alternative to allow_url_fopen(). PECL
developer Michael Wallner pointed out that pecl_http was ‘far from
stable
‘, and Derick explained that there is now a policy of not adding
specialized extensions to the PHP core anyway.

Wez requested that the beta might be postponed until the following weekend, as he
was in the middle of a house move and still had some PDO work outstanding.

Short version: Push me, pull you.

TLK: PHP 5.1 (ifsetor revisited)

Jason Garber argued that the potential uses for ifsetor() were in no
way eliminated by filtering; in migrating to development under E_STRICT
simple things, such as accessing an array key that may or may not have been
included, become ‘rather miserable‘ without such a construct. If there were a
way to do it cleanly in userland PHP code, he would have done so already… Sean
Coates argued with Jason, saying that a trinary operator could get around the issue,
but Jason showed him this:

as opposed to the hoped-for:

Xuefer asked whether adding features necessarily makes PHP more complex, before
going on to argue that beginners would understand ifsetor() quite
easily, although default might be an easier name for non-English
speakers to grasp. Nick Loeve pointed out that default is in fact a
reserved word in PHP.

Marcus gave his view that the major advantages of ifsetor(), as
against isset() ? :, were that the former could be twice as fast and
much more readable.

Short version: We _still_ haven’t heard the last of ifsetor().

NEW: PHP 4_4 branch

Derick Rethans, Man of Action, announced that he’d created the PHP 4_4 branch,
and the 4_3 branch should no longer be used. Only critical fixes would be allowed in
the new branch; everything else would be reverted immediately. The snapshot builder
would be updated to create PHP 4.4.0-dev builds later that day. Any issues he might
have missed should be brought to his attention on the internals list.

Andrei, Derick and Wez went into a huddle over how best to close down the 4_3
branch, with Wez eventually denying commits to the old branch via the pre-commit
hooks. Derick then committed his references patch, and announced that he’d like to
release PHP 4.4 RC 1 the following Monday.

Short version: Blimey.

FIX: Abstract private methods

Stas asked whether anyone could think of a use for allowing abstract private
methods to be declared? In his own opinion, it was meaningless and should be
disallowed, as it only says “this method does not exist and never will”; was he
missing something?

John LeSueur thought it could be used to reserve a function for future use, but
John Coggeshall pointed out that you couldn’t create an instance of a class
containing abstract methods anyway. Andi and Sebastian both agreed that it was
useless. John L. continued to argue with Stas off-list, but Stas squashed his
argument by explaining carefully that child classes can’t see private methods. You
couldn’t sanely ask to both override and hide a method from a child
class.

Having achieved 100% agreement from the core devs (a first?), Stas proceeded to
disallow abstract private methods across all affected branches of PHP.

Short version (thanks Stas): RIP abstract private.

TLK: PECL and the win32 build

Wez wrote a useful note for those using the configure.js build system
under win32:

Short version: A tip from the top.

TLK: Catalan translation team

Bernat Foj Capell (CVS account request: jabro) started it:

Ten Catalan documentation account requests followed over the next few days,
causing Wez to question whether all – or indeed any – of these requests were
genuine. PHP Manual Editor Gabor Hojtsy confirmed that there was some apparent
activity, but pointed out that it would be impossible to be ascertain whether the
offer was genuine until at least one of the applicants had CVS access and somewhere
to commit their work…

Short version: The entire PHP dev team read the CIA World Fact Book
entry on Catalonia this week.

TLK: MySQL socket

Systems administrator Vincent Pages came up with a way to force the hostname for
a MySQL server from php.ini, eliminating the need to ask all his users to
update the server name from ‘localhost‘ on their web pages when their
cluster setup changed. Ilia wondered why the users hadn’t been asked to use the
correct name for the host in the first place; Vincent agreed that this would have
been better, but it wasn’t a solution for his current problem.

Dale Walsh felt that Vincent’s proposal made logical business sense and would
solve a lot of issues when MySQL support was provided as part of a web hosting
package. However, the flaw in the logic was that it didn’t offer any real
improvement to PHP itself; it would be considered a transitional feature by the PHP
development team, and they were unlikely to be interested in it as a result. He felt
that this political issue could be avoided by programming the php.ini changes
into the bundled MySQL source, as the MySQL development team were far more likely to
approve the idea.

Ilia pointed out that you can in fact force a given host already; you can set the
default connection parameters for mysql_connect() via the .ini
setting on a per-vhost basis, and disable the ability to specify them directly by
turning on SQL safe mode.

Short version: Storm in a haystack.

CVS: PHP 5.1 (beta 1)

Dmitry had an interesting week, applying whole armfuls of memory corruption and
leakage fixes to all branches of PHP. He also fixed previous opcode lookups in the
Zend Engine, merged three opcodes into one (killing off
ZEND_JMP_NO_CTOR and ZEND_INIT_CTOR
ZEND_NEW now speaks for all of them), and made changes allowing Zend
extensions to overload opcodes in all execution modes, in CVS HEAD only. Somewhere
in there is the need for a Zend API bump to reflect these changes prior to the next
release.

Wez spent some time trying to track down a problem with double freeing in streams
(read: potential crash) before admitting defeat and focusing on PDO instead, where
he did a lot of work to make user defined functions possible in PDO_SQLITE. He hit a
snag over ‘soft’ dependency in the build system; SPL needs to be initialized prior
to PDO to allow runtime exceptions in the latter. Jani came up with a proof-of-concept startup modification patch
[dead link] for review, alongside a warning that in its current incarnation it makes
module startup slower; it’s just the concept we’re looking at here.

Jani also made a crowd-pleasing commit when he made it possible to have
--with-mysql --with-mysqli as part of PHP’s configure line. Derick
tried to outdo this by adding the --disable-zend-memory-manager
configure switch (making life easier for anyone trying to debug memory-related
issues in the Zend Engine), but my money’s on Jani here.

Andi ended the week by doing exactly what he’d said he’d do – he rolled the 5.1
beta, much to everyone’s shock. Following the howls of protest and a few last-minute
commits – including a fairly major PDO shutdown fix from Wez – he set out to roll a
second beta. Derick asked him nicely to sit on that version for a couple of days
before releasing it, and Andi complied.

Short version: Any major core changes should be made before beta…

PAT: dl and shutdown

Wez tracked down a dl() and shutdown issue to a call from
zend_clean_module_rsrc_dtors_cb() to non-existent lists when
dl()‘d modules have registered resources, and mailed in ‘a hack’ to
handle that situation which had absolutely no public response.

Jani applied some NetWare/MySQL configuration changes from Kamesh Jayachandran
(this is becoming traditional), and Wez himself applied a patch from Zhao Ming Sen
for PDO_DBLIB, allowing statement row/column metadata to be reported correctly for
Sybase.

Short version: Zend Engine patch from Wez awaiting review in PAT.