Categories


Loading feed
Loading feed
Loading feed

Zend Weekly Summaries Issue #357


TLK: How to double ML volume
TLK: Turkish locale bugs
NEW: Reference macros
TLK: Namespaces and autoload [continued]
TLK: Constant folding optimization
CVS: zend_alter_ini_entry fixed
PAT: No good solution

2nd September - 8th September 2007

TLK: How to double ML volume

This week, an Enquirer turned up and alienated just about everybody on the PHP core development team. How did he manage that? Prepare for a long, long session while I count the ways.

It started with an innocuous query about the availability of PEAR classes for PHP 6. Greg Beaver fielded that one, and sent the Enquirer to the PEAR development list.

Roughly 12 hours later, the Enquirer returned with a new topic. Bug #42515 (MySQL Library Locations Incorrect) was bogus, and he could prove it! The problem was that MySQL AB were supplying faulty binaries; the issue had nothing to do with PHP at all! Great... This news being delivered under a pseudonym (which I can't use here because it happens to be the name of a company which may be owned by someone else), there was no way to assess the credentials of the author.

So it was that nobody swallowed the bait; the email went unanswered and all was quiet, for another 12 hours or so. The Enquirer then tried another new topic: "buildconf and the generated configure script for PHP6 is faulty." Two lines were ending up in the wrong place in the generated configure script, breaking the build. The first few lines of buildconf output read:

buildconf: checking installation...
buildconf: autoconf version 2.59 (ok)
buildconf: Your version of autoconf likely contains buggy cache code.
           Running cvsclean for you.
           To avoid this, install autoconf-2.13.

Jani Taskinen, who generally keeps the PHP build system chugging along, explained with his usual bluntness that the Enquirer should stick with the pre-generated configure script 'and not try to hack things since you obviously have no clue'. However, if he must build the configure script himself he should use autoconf 2.13, since that is the only version supported. The Enquirer demanded to know 'who pissed in Jani's cornflakes?'... not that it really mattered, since he'd now fixed the script locally to his own satisfaction. Carelessness on Jani's part wasn't a problem to him, but Jani must've 'stopped taking his medication' if he wasn't prepared to offer a free service to get the Enquirer's configure script to rebuild nicely. Why should he need to downgrade autoconf for PHP 6 when buildconf worked fine with autoconf 2.59 in every other PHP version?

Jani didn't respond; he was busy checking out the previous email about the MySQL bug. Hartmut Holzgraefe did, removing the 'inappropriate language' from the email history in his reply. He explained that buildconf simply calls autoconf, and the fact that there are differences in the output generated by different versions of autoconf is a strong indication of bugginess. Why blame the buildconf script for this? Nicolas Bérard-Nault pointed out that the manual clearly states that autoconf 2.13 is required; you can have two versions of autoconf on the same system, so no downgrade is required; and finally, internals@ is not a general help list. David Coallier was less angelic; he simply asked the Enquirer to take his personal insults off-list.

Unfortunately for Hartmut, the Enquirer seems to have done just that. Some time later, he saw fit to post the remnants of that exchange on-list as email history. We learned that Hartmut 'makes assumptions' and 'has a bad memory', and that his insistence on there being an autoconf bug had led the Enquirer to mark his local changes as an unsupported autoconf version fix. However, the Enquirer posted those changes to internals@ just a few hours later, albeit in an unusable format. This prompted Pierre-Alain Joye to explain that PHP has a bug tracking system, and using it is the only way to be certain that patches aren't lost.

We got a lovely letter out of that. Pierre was bewildered and upset by it; he hadn't meant anything bad, he'd only tried to tell the guy how the system works. Hartmut took it upon himself to explain precisely why the system works as it does, pointing out that the Enquirer's attempts to buck it had led to 'a mess'. The Enquirer didn't like that description one bit. He posted a very long and downright peculiar mail in response, detailing his expertise in building PHP, his potential usefulness to the community, and the bad PR that would result for the PHP project if he were to report every bug he found. Stut's reply said it for everybody; not reporting bugs because you're worried that the developers won't cope with the volume or because you're concerned about damaging the reputation of the software is plain 'idiotic'. Nuno Lopes wrote that the Enquirer should either follow the process or stop sending his innumerable posts to the PHP mailing lists (other lists were impacted this week, it seems.) Tony Dovgal simply asked the Enquirer to stop, 'here, right now' if he hoped to stay out of everybody's ignore list. If he wanted better support for the Mac OSX platform, the way to go was to join the team, not work against them. Stas Malyshev noted mildly that the team can't actually demand that the Enquirer either report bugs or post fixes, but it would be 'good citizenship' on his part to share the information.

The Enquirer retorted that if mentioning a handful of bugs garnered complaints about his swamping the mailing lists, how would it be if he dumped a couple of hundred? Didn't it matter to anyone that PHP is universally seen by end users as 'a collection of cruft managed by a group of pompous programmers'? He personally felt it better to refrain from submitting non-critical bug reports rather than contribute to an already negative impression. Still, he'd like to drop this discussion now, 'to avoid retaliations due to bruised egos'. Jani mentioned that Mac OSX would be much better supported by PHP if someone would buy him one, and added it to his Amazon wishlist. The Enquirer promptly tried to buy him one, but reported his discovery that US based Amazonians can't make wishlist purchases for Finnish Amazonians. Hartmut, responding to an earlier post, reiterated that the proper place for bug reports is in the bug reporting system. He added that the Enquirer's insistence on ignoring this advice would lead to his not being taken seriously, as indeed would the use of a pseudonym. Then he picked up the 'pompous programmers' email, and saw red. He posted much the same advice over again in response, this time with uncharacteristic heat.

William A. Rowe chose that moment to back the Enquirer. He wrote that practically everyone had been less than civil during the course of this thread, and the man actually had a good point. 'PHP has more fragile dependencies on build tools than any modern open source project out there', and the reliance on a long abandoned version of autoconf was a completely valid topic of discussion on the internals list. William's attempt at appeasement failed, largely because the Enquirer ignored it; he was busy writing a complete character assassination of Hartmut at the time. The chief accusation, ironically, was that of flaming. David repeated his request that any personal attacks be taken off-list, and was promptly attacked himself. Everyone simply stopped posting at this point; the thread had deteriorated to such a degree that it couldn't be remedied.

Unbelievably, 10 hours later the Enquirer was back with a new topic, or maybe it was just a test to see if anyone would still read his posts. The rather garbled question he posted might have been about ext/mysql, or it might have been about ext/mysqli. Whichever it was, he believed the extension theoretically should have SSL support if the underlying library had it, since that information is available in the mysql_config file used during the build. The Enquirer noted that 'the mysql module' lacks both ARGS for SSL and a method to retrieve dependent libraries, but wanted to check his assumptions before submitting a feature request.

Rasmus Lerdorf helpfully pointed to the entry about the MYSQL_CLIENT_SSL connection constant in the PHP manual, but the Enquirer had already found it. However, during the hour it took to get a ML response, he'd discovered that MySQL's SSL support is only enabled in PHP if ext/openssl is also enabled. Manually adding the SSL libraries to the Makefile was enough to get everything working, without enabling the openssl extension. Perhaps ext/openssl should be auto-enabled, or perhaps the SSL libraries should added to either the SAPI build or the MySQL build, where the underlying MySQL library had SSL support? If so, he would be willing to do the preparatory work to put this into place, and to do so in line with PHP's standard development practices. If not, the documentation needed fixing.

Amazingly, after all that had gone before, it was Hartmut that responded; Hartmut that explained about the current status of ext/mysql; Hartmut that gave good advice about testing the underlying mysql_ssl_set() function and cross-version build compatibility. The Enquirer's responses were civil this time - give or take a single unwarranted off-list attack, which Hartmut chose to ignore - but the words 'thank you' were a glaring omission.

On Saturday, right at the end of the period covered by this summary, the Enquirer came back with yet another new topic. It seems the internals list is in for a period of regular Enquirer attention. Hopefully, by now you'll understand and forgive my reluctance to fully report this or future exchanges with this person, although I will of course continue to report any relevant matters arising from such exchanges.

Short version: How rude.

TLK: Turkish locale bugs

Tomas Kuliavas wrote to internals@, somewhere in among the flames, demanding an explanation. Why couldn't his bug, #42526 (Broken classes and method names in Turkish locale) be fixed? (Jani had closed it as a duplicate to bug #35050, 'Capital "I" letters in func/class method names do not work with Turkish locale'). It couldn't be because locale-insensitive tolower() breaks things, because PHP functions are themselves locale-insensitive in some set-ups... He went on to create a small flame of his own, which I'm going to ignore because I'm all flamed out already and it didn't add anything useful. Jani responded swiftly, pointing out that Tomas was free to send a patch or to add his comments to the #35050 bug report, which is currently marked as 'Won't fix'. However, he added, please don't add yet another report about a known issue into the database in future.

Tomas retorted that he would add his comments on bug #35050, where Jani could ignore them, and there was already a patch attached to the report for bug #35583 (Calling user defined functions after setlocale("tr_TR") produces errors). He did understand that the solution offered there was less than optimal, but the only other fix he had for it was GPL licensed code and so couldn't be used in PHP. That apart, Tomas noted that there had been another change; strcasecmp() is no longer locale-aware. Either there is no regression test in place for bug #19795 (Problems with strnatcmp() and strnatcasecmp()), or that test relies on a locale that isn't available on his box. strtolower(), strtoupper() and stristr(), on the other hand, have all retained locale-awareness; he'd yet to check the rest.

Later in the week, Tomas presented a patch that he'd hoped to attach to bug #35050, before finding that he couldn't because the bug is closed. Jani helpfully added a link to Tomas' patch in the bug report, but explained that he isn't in a position to make any decisions about applying it in the Zend Engine.

Short version: A possible pre-Unicode solution for bug #35050 - thanks Tomas.

NEW: Reference macros

David Wang, of garbage collection fame, posted a patch to manage reference counting and tracking using macros. All the macros, he wrote, are implemented with forcibly inlined functions, making it possible to put multiple statements into each macro at a later point for garbage collection purposes. David added that he'd also renamed the existing ZVAL_ADDREF and ZVAL_DELREF to Z_ADDREF_P and Z_DELREF_P along the way, since their original naming was non-standard.

Nuno liked the idea overall, and suggested that it might be useful if the team decided to investigate off-the-shelf garbage collectors in the future. He didn't think the ZVAL_*_P() functions needed to be anything other than macros, but David explained that this was the whole point; inline functions can accept multiple statements and so are much more flexible; Nuno should think of them as future-proofing. A tracing garbage collector wouldn't require the macros, because reference counts would be eliminated altogether, but implementing one would be 'a pain' (understatement). Off-the-shelf garbage collectors would be inappropriate for PHP 'because we use some weird kinds of "pointers" (such as object handles) stored in weird kinds of ways (such as a zend_hash object)'. Besides, they would be inefficient; they only scan the stack, registers and heap, whereas in PHP garbage collection would need to apply to the code the PHP interpreter is running, not the interpreter itself. A traditional mark-and-sweep collector might be faster, simply because it would eliminate the refcount field. That said, 'rummaging through objects scattered all over memory' would lead to a lot of cache misses; the question was whether it would lead to more cache misses than there currently are. Answering that question would require implementation, and David considered it would be a bit of a nightmare to implement: 'Roots would include zvals linked to PHP variables, the stack of the running PHP code, and the stack and heap of the PHP interpreter itself'. PHP wasn't designed with memory management in mind, and the fact that extensions rely on reference counting would make implementation difficult. Referring to his garbage collection patch, David explained that the cycle collector barely touches 'the whole reference counting mess', which is the main reason he believes it relatively safe.

Nuno thought off-the-shelf GC would still be worth a try if David's macro patch were accepted, even if it wasn't the best solution out there. He believed that GC wouldn't even run for the majority of PHP requests; the garbage would be collected after the requests, thereby reducing latency. Nuno agreed that implementing GC from scratch would be 'a difficult job', which was why he thought it worth investigating off-the-shelf solutions. However, looking at garbage collection in other languages showed that the implementations tended to take advantage of internal structures. Perhaps PHP might find a student crazy enough to look into it during next year's GSoC... or perhaps Nuno might look into it himself as part of his MSc...

Andi Gutmans agreed with David over both the unsuitability of mark-and-sweep GC and the usefulness of inline functions, not least because the latter are straightforward to debug. He liked the look of the patch, but wanted to give the rest of the Zend team time to review it and comment. To Nuno, Andi added that PHP script execution is very heap intensive. If the garbage collector didn't run during the request, there'd be a huge memory hit - big enough to significantly affect the number of Apache processes that could run on a given box.

Marcus Börger asked David to think about how usage of his macros could be enforced. Should there be some random prefix for refcount and is_ref, for example? David thought this a good point; he'd actually been using __gc for just that purpose during his test phase, but had removed it for the patch. Perhaps it should go back in, he wrote, and promptly added it.

Andi came back with full team approval, but asked David to make it possible to switch off the __gc naming using #if ZEND_GC. Without that, a lot of third party libraries and PECL extensions would be broken. David obliged, but Derick Rethans queried Andi's request. Without the prefix, there'd be no indication that a third party extension would break when the upcoming garbage collector was running. Cristian Rodriguez pointed out that extensions would break anyway because source compatibility is broken by the patch, and asked David to bump the Zend API number as part of his changes. David obliged again, but noted that the ZEND_GC switch defeats the object of the __gc prefix, which (if you recall) was supposed to force the use of the new macros. He offered up two versions of his patch for consideration at this stage, one with and one without ZEND_GC. Andi pointed out that many applications wouldn't even need garbage collection, and GC would probably not always be enabled, at least to begin with; 'let's not run before we can walk'. The patch still needed testing, reviewing and stabilizing, and extensions that don't need garbage collection shouldn't be broken in PHP 5.3. The point of committing David's macros, from Andi's perspective, was simply to make the full garbage collector patch review easier during the test and stabilization period. Once that was over, there'd need to be a decision; whether GC should be always enabled, a configuration parameter or a compile-time option. Until that decision had been taken, randomly breaking source code compatibility made no sense.

David provided a laudably cool-headed analysis of the options. All things well and truly considered, he concluded that the ZEND_GC switch was actually a good idea for now, so long as it will be removed when garbage collection is integrated. Andi wrote that he could commit the macro patch into CVS now, or the Zend team would happily commit it if David didn't have karma. David explained that he didn't even have a CVS account, but would need one in future so that he could respond to any bugs arising from his changes. Adam Maccabee Trachtenberg popped up out of nowhere with a link to the account request page. And that - give or take a few quibbles over the nature of the testing - was that.

Short version: The macros await David's CVS account approval.

TLK: Namespaces and autoload [continued]

François Laupretre queried Dmitry Stogov's assumption, in the __autoload()/namespaces discussion last week, that it was okay for autoload handlers to throw errors or exceptions under certain circumstances. They shouldn't, and in fact if the handler were registered through SPL, they wouldn't. When it came to the __autoload() function itself, there was no point in raising an error when a symbol is not found because the PHP interpreter does it anyway. François therefore proposed that PHP should ignore any error or exception raised from autoload handlers - which would render Dmitry's proposed additional argument useless.

That said, François also had a plan to add a second argument to autoload handlers, but with a different angle. He'd like to pass the type of symbol being sought; class or interface. At present, autoload handlers have to try both. This isn't generally a problem, since most current autoload handlers are primitive filename-based efforts that treat classes and interfaces in the same way, but what will happen if/when support for function and constant autoloading is added to PHP?

Stas agreed that it doesn't make a lot of sense to raise errors in a chained autoloader, since the next in line might be able to load the class, but had concerns about the performance impact of exhausting all autoloading opportunities. It also wouldn't be very friendly to have all that searching going on just for something like $foo = new DateTime() - if someone didn't have their own DateTime class, they shouldn't have to pay for the possibility of having one. Stas didn't quite get François' point about "having to try both" class and interface, though, and asked him to explain more fully.

Short version: A staggered conversation that may end some time in 2009.

TLK: Constant folding optimization

Nuno presented the Zend team (and the internals list) with a patch to implement constant folding optimization in the Zend Engine. Could they please review it? He'd found two test regressions with the patch, both of which appear because of the "division by zero" warning; this problem would need to be fixed prior to committal.

Stas liked the idea, but thought the patch should enable constant expressions in constant contexts, e.g. function foo($a = 2 + 2) or const $a = 2 + 2. If the functionality were moved to parser level, it could do this too. Nuno pointed out that this syntax isn't actually supported by PHP at present; he'd quite like it to be added, but it would mean a few grammar rule modifications. He'd look into it, if people were interested...?

Stas confirmed that this was what he'd had in mind. It wouldn't break anything, and it would allow a little more freedom; 'let's see if anyone objects.'

Short version: Hands up if you have any idea what they're talking about...

CVS: zend_alter_ini_entry fixed

Changes in CVS that you should probably be aware of include:

  • Core bug #42468 (Write lock on file_get_contents() fails when using a compression stream) was fixed [Ilia, Jani]
  • In ext/pgsql, bug #42506 (php_pgsql_convert() timezone parse bug) was fixed [Ilia]
  • In ext/mbstring, elderly bug #29955 (invalid case conversion in iso-8859-9) was fixed [Rui Hirokawa]
  • Zend Engine bug #42541 (Check for namespace decl. on first line doesn't work when extended info is on) was fixed in CVS HEAD [Dmitry]
  • In the CGI SAPI, bug #42523 (PHP_SELF duplicates path) was fixed [Dmitry]
  • In ext/soap, bugs #42488 (SoapServer reports an encoding error and the error itself breaks) and #42214 (SoapServer sends clients internal PHP errors) were fixed [Dmitry]
  • In the Apache SAPI, bug #42579 (apache_reset_timeout() does not exist) was fixed [Jani]

In other CVS news, Jani took Ilia to task over his failure to merge changes made in the 5_2 branch to CVS HEAD, at one point demanding that Ilia's CVS account be revoked. Andrei Zmievski intervened to ask Jani to calm down, and Ilia made time to merge the missing patches. A mollified Jani wrote to Stas about the zend_alter_ini_entry() problem Stas had enquired about at the end of last week, but found a workable solution while waiting for a response. Uwe Schindler tested Jani's patch, and it went into the Zend Engine at the end of the week.

Andrey Hristov was also busy this week. He fixed ext/mysqli bug #42378 (bind_result memory exhaustion), and also a regression failure he'd found there, bug #38710 (data leakage because of nonexisting boundary checking in statements). Tony promptly wrote to inform him that the regression test supplied with the latter was failing on his machine. Andrey managed to reproduce the failure on his own machine by switching MySQL servers from version 5.1 to 5.0, and wrote that the 5.0 version reports bad metadata. He fixed the failing test by altering its expected return value according to the server version.

Short version: The mysql/mysqli test coverage is starting to put everyone else to shame.

PAT: No good solution

Tony having been reminded about the universal binary build fix earlier in the week, he applied the eventual fix given by PHP user Christian Speich a couple of weeks back to the Zend Engine and affected extensions. Unfortunately, he forgot to credit Christian for his part in this; hopefully mentioning it here will make up for that a little.

Some more FastCGI SAPI code from Mattias Bengtsson arrived, this time adding checks for malformed FastCGI requests. Dmitry committed the checks.

François continued to argue that with Rui Hirokawa that bug #42396 is a bug, and has been ever since the __HALT_COMPILER() token was introduced since the presence of NULL bytes no longer reliably indicates Unicode encoding. One solution might be simply to document it and banish the use of __HALT_COMPILER() from Unicode encoded scripts, but this would render the token 'almost useless'. The solution François had proposed in his patch might not be elegant, but at least it would mean that __HALT_COMPILER() and zend_multibyte were no longer incompatible. Given that the token is a recent innovation and not widely used, he considered the performance hit acceptable for now.

Sensing that this was unlikely to go far with Rui, François appealed to Greg and Marcus at the end of his email for support or for better ideas. Greg responded, but it wasn't good news; he couldn't see a solution without making changes to PHP itself. That said, the declare(encoding) statement introduced in PHP 6 would at least remove the guesswork regarding a file's encoding. Rui wrote at this point to let both archive authors know that declare(encoding) is already supported in ext/mbstring, and has been since PHP 4.3. He suggested that they set detect_unicode off, and/or add declare(encoding) on the first line of their scripts.

Short version: That zend_multibyte patch is going nowhere, but the available solutions aren't great.

Comments


Tuesday, November 27, 2007
CONSTANT FOLDING
12:31PM PST · Nuno Lopes [unregistered]