Zend Weekly Summaries Issue #357

      1 Comment on Zend Weekly Summaries Issue #357

TLK: How to double ML volume
TLK: Turkish locale bugs
NEW: Reference macros
TLK: Namespaces and autoload [continued]
TLK: Constant folding optimization
CVS: zend_alter_ini_entry fixed
PAT: No good solution

2nd September – 8th September 2007

TLK: How to double ML volume

This week, an Enquirer turned up and alienated just about everybody on the
PHP core development team. How did he manage that? Prepare for a long, long
session while I count the ways.

It started with an innocuous query about the availability of PEAR classes for
PHP 6. Greg Beaver fielded that one, and sent the Enquirer to the PEAR
development list.

Roughly 12 hours later, the Enquirer returned with a new topic. Bug #42515 (MySQL Library Locations
Incorrect) was bogus, and he could prove it! The problem was that MySQL AB
were supplying faulty binaries; the issue had nothing to do with PHP at all!
Great… This news being delivered under a pseudonym (which I can’t use here
because it happens to be the name of a company which may be owned by someone
else), there was no way to assess the credentials of the author.

So it was that nobody swallowed the bait; the email went unanswered and all
was quiet, for another 12 hours or so. The Enquirer then tried another new
topic: “buildconf and the generated configure script for PHP6 is faulty.”
Two lines were ending up in the wrong place in the generated configure
script, breaking the build. The first few lines of buildconf
output read:


buildconf: checking installation...
buildconf: autoconf version 2.59 (ok)
buildconf: Your version of autoconf likely contains buggy cache code.
           Running cvsclean for you.
           To avoid this, install autoconf-2.13.

Jani Taskinen, who generally keeps the PHP build system chugging along,
explained with his usual bluntness that the Enquirer should
stick with the pre-generated configure script ‘and not try to hack things
since you obviously have no clue
‘. However, if he must build the
configure script himself he should use autoconf 2.13, since that
is the only version supported. The Enquirer demanded to know ‘who pissed in
Jani’s cornflakes?
‘… not that it really mattered, since he’d now fixed
the script locally to his own satisfaction. Carelessness on Jani’s part
wasn’t a problem to him, but Jani must’ve ‘stopped taking his
medication
‘ if he wasn’t prepared to offer a free service to get the
Enquirer’s configure script to rebuild nicely. Why should he need to
downgrade autoconf for PHP 6 when buildconf worked
fine with autoconf 2.59 in every other PHP version?

Jani didn’t respond; he was busy checking out the previous email about the
MySQL bug. Hartmut Holzgraefe did, removing the ‘inappropriate
language
‘ from the email history in his reply. He explained that
buildconf simply calls autoconf, and the fact that
there are differences in the output generated by different versions of
autoconf is a strong indication of bugginess. Why blame the
buildconf script for this? Nicolas Bérard-Nault pointed
out that the manual clearly
states
that autoconf 2.13 is required; you can have two
versions of autoconf on the same system, so no downgrade is
required; and finally, internals@ is not a general help list. David Coallier
was less angelic; he simply asked the Enquirer to take his personal insults
off-list.

Unfortunately for Hartmut, the Enquirer seems to have done just that. Some
time later, he saw fit to post the remnants of that exchange on-list as email
history. We learned that Hartmut ‘makes assumptions‘ and ‘has a bad
memory
‘, and that his insistence on there being an autoconf
bug had led the Enquirer to mark his local changes as an unsupported
autoconf version fix. However, the Enquirer posted those changes
to internals@ just a few hours later, albeit in an unusable format. This
prompted Pierre-Alain Joye to explain that PHP has a bug tracking system, and using it is the
only way to be certain that patches aren’t lost.

We got a lovely letter
out of that. Pierre was bewildered and upset by it; he hadn’t meant anything
bad, he’d only tried to tell the guy how the system works. Hartmut took it
upon himself to explain
precisely
why the system works as it does, pointing out that the
Enquirer’s attempts to buck it had led to ‘a mess‘. The Enquirer
didn’t like that description one bit. He posted a very long and downright
peculiar mail
in response, detailing his expertise in building PHP, his
potential usefulness to the community, and the bad PR that would result for
the PHP project if he were to report every bug he found. Stut’s reply said it
for everybody; not reporting bugs because you’re worried that the developers
won’t cope with the volume or because you’re concerned about damaging the
reputation of the software is plain ‘idiotic‘. Nuno Lopes wrote that
the Enquirer should either follow the process or stop sending his innumerable
posts to the PHP mailing lists (other lists were impacted this week, it
seems.) Tony Dovgal simply asked the Enquirer to stop, ‘here, right
now
‘ if he hoped to stay out of everybody’s ignore list. If he wanted
better support for the Mac OSX platform, the way to go was to join the team,
not work against them. Stas Malyshev noted mildly that the team can’t
actually demand that the Enquirer either report bugs or post fixes,
but it would be ‘good citizenship‘ on his part to share the
information.

The Enquirer retorted that if mentioning a handful of bugs garnered
complaints about his swamping the mailing lists, how would it be if he dumped
a couple of hundred? Didn’t it matter to anyone that PHP is universally seen
by end users as ‘a collection of cruft managed by a group of pompous
programmers
‘? He personally felt it better to refrain from submitting
non-critical bug reports rather than contribute to an already negative
impression. Still, he’d like to drop this discussion now, ‘to avoid
retaliations due to bruised egos
‘. Jani mentioned that Mac OSX would be
much better supported by PHP if someone would buy him one, and added it to
his Amazon wishlist. The Enquirer promptly tried to buy him one, but reported
his discovery that US based Amazonians can’t make wishlist purchases for
Finnish Amazonians. Hartmut, responding to an earlier post, reiterated that
the proper place for bug reports is in the bug reporting system. He added
that the Enquirer’s insistence on ignoring this advice would lead to his not
being taken seriously, as indeed would the use of a pseudonym. Then he picked
up the ‘pompous programmers’ email, and saw red. He posted much the same
advice over again in response, this time with uncharacteristic heat.

William A. Rowe chose that moment to back the Enquirer. He wrote that
practically everyone had been less than civil during the course of this
thread, and the man actually had a good point. ‘PHP has more fragile
dependencies on build tools than any modern open source project out
there’
, and the reliance on a long abandoned version of
autoconf was a completely valid topic of discussion on the
internals list. William’s attempt at appeasement failed, largely because the
Enquirer ignored it; he was busy writing a complete character
assassination
of Hartmut at the time. The chief accusation, ironically,
was that of flaming. David repeated his request that any personal attacks be
taken off-list, and was promptly attacked himself. Everyone
simply stopped posting at this point; the thread had deteriorated to such a
degree that it couldn’t be remedied.

Unbelievably, 10 hours later the Enquirer was back with a new topic, or maybe
it was just a test to see if anyone would still read his posts. The rather garbled question he posted
might have been about ext/mysql, or it might have been about
ext/mysqli. Whichever it was, he believed the extension theoretically
should have SSL support if the underlying library had it, since that
information is available in the mysql_config file used during the
build. The Enquirer noted that ‘the mysql module‘ lacks both
ARGS for SSL and a method to retrieve dependent libraries, but
wanted to check his assumptions before submitting a feature request.

Rasmus Lerdorf helpfully pointed to the entry about the
MYSQL_CLIENT_SSL connection constant in the PHP manual, but the Enquirer had already
found it. However, during the hour it took to get a ML response, he’d
discovered that MySQL’s SSL support is only enabled in PHP if
ext/openssl is also enabled. Manually adding the SSL libraries to the
Makefile was enough to get everything working, without enabling the
openssl extension. Perhaps ext/openssl should be auto-enabled,
or perhaps the SSL libraries should added to either the SAPI build or the
MySQL build, where the underlying MySQL library had SSL support? If so, he
would be willing to do the preparatory work to put this into place,
and to do so in line with PHP’s standard development practices. If
not, the documentation needed fixing.

Amazingly, after all that had gone before, it was Hartmut that responded;
Hartmut that explained about the current status of ext/mysql; Hartmut
that gave good advice about testing the underlying
mysql_ssl_set() function and cross-version build compatibility.
The Enquirer’s responses were civil this time – give or take a single
unwarranted off-list attack, which Hartmut chose to ignore – but the words
‘thank you’ were a glaring omission.

On Saturday, right at the end of the period covered by this summary, the
Enquirer came back with yet another new topic. It seems the internals list is
in for a period of regular Enquirer attention. Hopefully, by now you’ll
understand and forgive my reluctance to fully report this or future exchanges
with this person, although I will of course continue to report any relevant
matters arising from such exchanges.

Short version: How rude.

TLK: Turkish locale bugs

Tomas Kuliavas wrote to internals@, somewhere in among the flames, demanding
an explanation. Why couldn’t his bug, #42526 (Broken classes and method names
in Turkish locale) be fixed? (Jani had closed it as a duplicate to bug #35050, ‘Capital “I” letters in
func/class method names do not work with Turkish locale’). It couldn’t be
because locale-insensitive tolower() breaks things, because PHP
functions are themselves locale-insensitive in some set-ups… He went on to
create a small flame of his own, which I’m going to ignore because I’m all
flamed out already and it didn’t add anything useful. Jani responded swiftly,
pointing out that Tomas was free to send a patch or to add his comments to the
#35050 bug report, which is currently marked as ‘Won’t fix’. However, he
added, please don’t add yet another report about a known issue into the
database in future.

Tomas retorted that he would add his comments on bug #35050, where Jani could
ignore them, and there was already a patch attached to the report for bug #35583 (Calling user defined
functions after setlocale("tr_TR") produces errors). He did
understand that the solution offered there was less than optimal, but the
only other fix he had for it was GPL licensed code and so couldn’t be used in
PHP. That apart, Tomas noted that there had been another change;
strcasecmp() is no longer locale-aware. Either there is no
regression test in place for bug
#19795
(Problems with strnatcmp() and
strnatcasecmp()), or that test relies on a locale that isn’t
available on his box. strtolower(), strtoupper()
and stristr(), on the other hand, have all retained
locale-awareness; he’d yet to check the rest.

Later in the week, Tomas presented a patch that he’d hoped to attach to bug
#35050, before finding that he couldn’t because the bug is closed. Jani
helpfully added a link to Tomas’ patch in the bug report, but explained that
he isn’t in a position to make any decisions about applying it in the Zend
Engine.

Short version: A possible pre-Unicode solution for bug #35050 –
thanks Tomas.

NEW: Reference macros

David Wang, of garbage collection fame, posted a patch to
manage reference counting and tracking using macros. All the macros, he
wrote, are implemented with forcibly inlined functions, making it possible to
put multiple statements into each macro at a later point for garbage
collection purposes. David added that he’d also renamed the existing
ZVAL_ADDREF and ZVAL_DELREF to
Z_ADDREF_P and Z_DELREF_P along the way, since
their original naming was non-standard.

Nuno liked the idea overall, and suggested that it might be useful if the
team decided to investigate off-the-shelf garbage collectors in the future.
He didn’t think the ZVAL_*_P() functions needed to be anything
other than macros, but David explained that this was the whole point; inline
functions can accept multiple statements and so are much more flexible; Nuno
should think of them as future-proofing. A tracing garbage collector wouldn’t
require the macros, because reference counts would be eliminated altogether,
but implementing one would be ‘a pain‘ (understatement). Off-the-shelf
garbage collectors would be inappropriate for PHP ‘because we use some
weird kinds of “pointers” (such as object handles) stored in weird kinds of
ways (such as a zend_hash object)
‘. Besides, they would be inefficient;
they only scan the stack, registers and heap, whereas in PHP garbage
collection would need to apply to the code the PHP interpreter is running,
not the interpreter itself. A traditional mark-and-sweep collector might be
faster, simply because it would eliminate the refcount field.
That said, ‘rummaging through objects scattered all over memory‘ would
lead to a lot of cache misses; the question was whether it would lead to more
cache misses than there currently are. Answering that question would require
implementation, and David considered it would be a bit of a nightmare to
implement: ‘Roots would include zvals linked to PHP variables, the stack
of the running PHP code, and the stack and heap of the PHP interpreter
itself’
. PHP wasn’t designed with memory management in mind, and the fact
that extensions rely on reference counting would make implementation
difficult. Referring to his garbage collection patch, David explained that
the cycle collector barely touches ‘the whole reference counting
mess
‘, which is the main reason he believes it relatively safe.

Nuno thought off-the-shelf GC would still be worth a try if David’s macro
patch were accepted, even if it wasn’t the best solution out there. He
believed that GC wouldn’t even run for the majority of PHP requests; the
garbage would be collected after the requests, thereby reducing
latency. Nuno agreed that implementing GC from scratch would be ‘a
difficult job
‘, which was why he thought it worth investigating
off-the-shelf solutions. However, looking at garbage collection in other
languages showed that the implementations tended to take advantage of
internal structures. Perhaps PHP might find a student crazy enough to look
into it during next year’s GSoC… or perhaps Nuno might look into it himself
as part of his MSc…

Andi Gutmans agreed with David over both the unsuitability of mark-and-sweep
GC and the usefulness of inline functions, not least because the latter are
straightforward to debug. He liked the look of the patch, but wanted to give
the rest of the Zend team time to review it and comment. To Nuno, Andi added
that PHP script execution is very heap intensive. If the garbage collector
didn’t run during the request, there’d be a huge memory hit – big enough to
significantly affect the number of Apache processes that could run on a given
box.

Marcus Börger asked David to think about how usage of his macros could
be enforced. Should there be some random prefix for refcount and
is_ref, for example? David thought this a good point; he’d
actually been using __gc for just that purpose during his test
phase, but had removed it for the patch. Perhaps it should go back in, he
wrote, and promptly added it.

Andi came back with full team approval, but asked David to make it possible
to switch off the __gc naming using #if ZEND_GC.
Without that, a lot of third party libraries and PECL extensions would be
broken. David obliged, but Derick Rethans queried Andi’s request. Without the
prefix, there’d be no indication that a third party extension would break when
the upcoming garbage collector was running. Cristian Rodriguez pointed out
that extensions would break anyway because source compatibility is broken by
the patch, and asked David to bump the Zend API number as part of his
changes. David obliged again, but noted that the ZEND_GC switch
defeats the object of the __gc prefix, which (if you recall) was
supposed to force the use of the new macros. He offered up two versions of his
patch for consideration at this stage, one with and one without
ZEND_GC. Andi pointed out that many applications wouldn’t even
need garbage collection, and GC would probably not always be enabled, at
least to begin with; ‘let’s not run before we can walk‘. The patch
still needed testing, reviewing and stabilizing, and extensions that don’t
need garbage collection shouldn’t be broken in PHP 5.3. The point of
committing David’s macros, from Andi’s perspective, was simply to make the
full garbage collector patch review easier during the test and stabilization
period. Once that was over, there’d need to be a decision; whether GC should
be always enabled, a configuration parameter or a compile-time option. Until
that decision had been taken, randomly breaking source code compatibility
made no sense.

David provided a laudably
cool-headed analysis of the options
. All things well and truly considered,
he concluded that the ZEND_GC switch was actually a good idea for
now, so long as it will be removed when garbage collection is integrated. Andi
wrote that he could commit the macro patch into CVS now, or the Zend team would
happily commit it if David didn’t have karma. David explained that he didn’t even
have a CVS account, but would need one in future so that he could respond to
any bugs arising from his changes. Adam Maccabee Trachtenberg popped up out
of nowhere with a link to the
account request page
. And that – give or take a few quibbles over the
nature of the testing – was that.

Short version: The macros await David’s CVS account approval.

TLK: Namespaces and autoload [continued]

François Laupretre queried Dmitry Stogov’s assumption, in the
__autoload()/namespaces discussion last week, that it was okay
for autoload handlers to throw errors or exceptions under certain
circumstances. They shouldn’t, and in fact if the handler were registered
through SPL, they wouldn’t. When it came to the __autoload()
function itself, there was no point in raising an error when a symbol is not
found because the PHP interpreter does it anyway. François therefore
proposed that PHP should ignore any error or exception raised from autoload
handlers – which would render Dmitry’s proposed additional argument useless.

That said, François also had a plan to add a second argument to
autoload handlers, but with a different angle. He’d like to pass the type of
symbol being sought; class or interface. At present, autoload handlers have
to try both. This isn’t generally a problem, since most current autoload
handlers are primitive filename-based efforts that treat classes and
interfaces in the same way, but what will happen if/when support for function
and constant autoloading is added to PHP?

Stas agreed that it doesn’t make a lot of sense to raise errors in a chained
autoloader, since the next in line might be able to load the class, but had
concerns about the performance impact of exhausting all autoloading
opportunities. It also wouldn’t be very friendly to have all that searching
going on just for something like $foo = new DateTime() – if
someone didn’t have their own DateTime class, they shouldn’t
have to pay for the possibility of having one. Stas didn’t quite get
François’ point about “having to try both” class and interface,
though, and asked him to explain more fully.

Short version: A staggered conversation that may end some time in
2009.

TLK: Constant folding optimization

Nuno presented the Zend team (and the internals list) with a patch
to implement constant folding optimization in the Zend Engine. Could they
please review it? He’d found two test regressions with the patch, both of
which appear because of the “division by zero” warning; this problem would
need to be fixed prior to committal.

Stas liked the idea, but thought the patch should enable constant expressions
in constant contexts, e.g. function foo($a = 2 + 2) or
const $a = 2 + 2. If the functionality were moved to parser
level, it could do this too. Nuno pointed out that this syntax isn’t actually
supported by PHP at present; he’d quite like it to be added, but it would mean
a few grammar rule modifications. He’d look into it, if people were
interested…?

Stas confirmed that this was what he’d had in mind. It wouldn’t break
anything, and it would allow a little more freedom; ‘let’s see if anyone
objects
.’

Short version: Hands up if you have any idea what they’re talking
about…

CVS: zend_alter_ini_entry fixed

Changes in CVS that you should probably be aware of include:

  • Core bug #42468 (Write lock on
    file_get_contents() fails when using a compression stream) was
    fixed [Ilia, Jani]
  • In ext/pgsql, bug #42506
    (php_pgsql_convert() timezone parse bug) was fixed [Ilia]
  • In ext/mbstring, elderly bug
    #29955
    (invalid case conversion in iso-8859-9) was fixed [Rui
    Hirokawa]
  • Zend Engine bug #42541 (Check
    for namespace decl. on first line doesn’t work when extended info is on) was
    fixed in CVS HEAD [Dmitry]
  • In the CGI SAPI, bug #42523
    (PHP_SELF duplicates path) was fixed [Dmitry]
  • In ext/soap, bugs #42488
    (SoapServer reports an encoding error and the error itself
    breaks) and #42214
    (SoapServer sends clients internal PHP errors) were fixed
    [Dmitry]
  • In the Apache SAPI, bug #42579
    (apache_reset_timeout() does not exist) was fixed [Jani]

In other CVS news, Jani took Ilia to task over his failure to merge changes
made in the 5_2 branch to CVS HEAD, at one point demanding that Ilia’s CVS
account be revoked. Andrei Zmievski intervened to ask Jani to calm down, and
Ilia made time to merge the missing patches. A mollified Jani wrote to Stas
about the
zend_alter_ini_entry() problem
Stas had enquired about at the end of
last week,
but found a workable solution while waiting for a response. Uwe Schindler tested
Jani’s
patch
, and it went into the Zend Engine at the end of the week.

Andrey Hristov was also busy this week. He fixed ext/mysqli bug #42378 (bind_result memory
exhaustion), and also a regression failure he’d found there, bug #38710 (data leakage because of
nonexisting boundary checking in statements). Tony promptly wrote to inform
him that the regression test supplied with the latter was failing on his
machine. Andrey managed to reproduce the failure on his own machine by
switching MySQL servers from version 5.1 to 5.0, and wrote that the 5.0
version reports bad metadata. He fixed the failing test by altering its
expected return value according to the server version.

Short version: The mysql/mysqli test coverage is starting to put
everyone else to shame.

PAT: No good solution

Tony having been reminded about the universal binary build fix earlier in the
week, he applied the eventual fix given by PHP user Christian Speich a couple of
weeks back
to the Zend Engine and affected extensions. Unfortunately, he
forgot to credit Christian for his part in this; hopefully mentioning it here
will make up for that a little.

Some more FastCGI SAPI code from Mattias Bengtsson arrived, this time adding
checks for malformed FastCGI requests. Dmitry committed the checks.

François continued to argue that with Rui Hirokawa that bug #42396 is a bug, and has been
ever since the __HALT_COMPILER() token was introduced since the
presence of NULL bytes no longer reliably indicates Unicode
encoding. One solution might be simply to document it and banish the use of
__HALT_COMPILER() from Unicode encoded scripts, but this would
render the token ‘almost useless‘. The solution François had
proposed in his patch might not be elegant, but at least it would mean that
__HALT_COMPILER() and zend_multibyte were no longer
incompatible. Given that the token is a recent innovation and not widely used,
he considered the performance hit acceptable for now.

Sensing that this was unlikely to go far with Rui, François appealed
to Greg and Marcus at the end of his email for support or for better ideas.
Greg responded, but it wasn’t good news; he couldn’t see a solution without
making changes to PHP itself. That said, the declare(encoding)
statement introduced in PHP 6 would at least remove the guesswork regarding a
file’s encoding. Rui wrote at this point to let both archive authors know that
declare(encoding) is already supported in ext/mbstring,
and has been since PHP 4.3. He suggested that they set
detect_unicode off, and/or add declare(encoding) on
the first line of their scripts.

Short version: That zend_multibyte patch is going nowhere, but the
available solutions aren’t great.