Categories


Loading feed
Loading feed
Loading feed

Zend Weekly Summaries Issue #358


TLK: PHP 5.3 feature poll
TLK: INI parser caching
TLK: Phar agonistes
TLK: Constant folding optimization [continued]
FIX: run-tests.php
PAT: array_get
RFC: Multiple namespaces per file
TLK: Accessing global namespace
FIX: mail.force_extra_parameters
RFC: Marking functions as const
CVS: Namespace/autoload problem resolved
PAT: LSB, GC discussions

9th September - 15th September 2007

TLK: PHP 5.3 feature poll

Release Master hat firmly in place, Ilia Alshanetsky opened a new thread to discuss the shaping of PHP 5.3. Since his request for feature suggestions a couple of weeks ago, he'd collected 'a substantial list of key changes' that people would like to see in the 5.3.0 release. Ilia therefore asked all interested parties to vote on the following items:

  • Namespace support
  • pecl/intl as a core extension, disabled by default (external library)
  • Late static binding
  • David Wang's circular garbage collection patch
  • Support for SQLite 3 in ext/sqlite
  • Removal of safe_mode, register_globals and magic_quotes
  • mysqlnd in the core, used as a backend for ext/pdo_mysql and ext/mysqli and possibly enabled by default
  • Support for OpenID in ext/openssl
  • New array functions, array_replace[_recursive]()
  • E_DEPRECATED
  • The zend_arg_info const'ify patch
  • The GCC 4 -fvisibility patch
  • A switch for disabling/enabling materialized cursors in ext/mysqli
  • pecl/phar as a core extension (possibly enabled by default)
  • Matt Wilma's ZEND_SIGNED_MULTIPLY_LONG() optimization
  • New php.ini parser/scanner and CGI/FastCGI .htaccess-style INI support
  • __callStatic()
  • "strict classes" that do not permit dynamic property creation

We all did our best to oblige but 'I am not qualified to offer an intelligent opinion' (as Larry Garfield had it) was the only honest response most of us could give in some cases.

Surprisingly few of the inevitable side issues went as far as a sub-thread; most people just voted as requested. Lester Caine went a little crazy when he saw magic_quotes and friends on the list and had to be calmed down, although most figured that that particular suggestion wouldn't get through even if everyone voted for it. Rob Richards wanted to know which OpenID implementation was under discussion (Dmitry Stogov's). Most voters didn't want to see the relatively untested mysqlnd library or garbage collector enabled by default in 5.3.0, but were happy to have both in the PHP core. Cristian Rodriguez, although being all for -fvisibility support, wrote that the patch to implement it doesn't actually work on his box. I read the "strict classes" option the wrong way, but was rescued by Mike Wallner, who realized. Stas Malyshev, who didn't quite get the point of E_DEPRECATED, listed a bunch of items that were missing from Ilia's list: dynamic class access, the (binary) operator, always-enabled FastCGI, nowdocs, goto, __construct() in interfaces and constant folding at compile-time. François Laupretre seized the opportunity to have a pop at the decision process, which sounded like sour grapes to Marcus Börger.

Short version: Unusual restraint all round.

TLK: INI parser caching

Andi Gutmans wasn't too sure about that proposal for .htaccess-style INI support. He wrote that it should be optional, if it goes in at all, because the stat() calls involved would hit performance quite hard. Pierre-Alain Joye informed him that the plan was to cache the entries and have a single call to stat() when TTL is reached. Andi was a little happier, but asked that the implementation be generic in TSRM so that it could be used across PHP. Jani Taskinen pointed out that actually his patch caches already. However, he'd skipped calling stat() at all; his implementation just re-scans existing files when the cached results expire, since he'd thought it best to keep things simple. Pierre looked again at Jani's patch and agreed with him; it would be easy to maintain, work well and keep the performance impact to a minimum.

Alexey Zakhlestin thought there might be a better approach. He wrote that some operating systems - including the BSD-based systems, Linux and Windows - allow subscription to notifications from the filesystem. Stas saw where Alexey was going with this, and wondered aloud how expensive those notifications would be. Pierre clarified a few points about Jani's solution; the INI cache works on a per-directory basis, caching the entries found in the current and parent directories. The TTL that controls the checking is configurable; to illustrate, the htscanner tool has it set to 5 minutes by default. Alexey pointed out that using kqueue and friends would mean the TTL part could be skipped altogether. PHP would simply register itself as a subscriber to filesystem events in the relevant directories, and could then react immediately to any changes in them. He agreed with Stas, though, that the cost of filesystem notifications should be looked into.

Pierre wasn't certain, but believed that there would need to be a local session manager for each connection/registration. Besides, how would the notifications be sent to PHP? All things considered, there were advantages to the TTL system, and not least among them were simplicity and portability.

Short version: It all depends on who you think is the boss.

TLK: Phar agonistes

Another sub-thread from the feature voting was sparked when Andi voted against pecl/phar going into the core. Andi reiterated his past argument, that he'd prefer a standard format that could be manipulated with standard tools, and added a new one; the Zend team's testing had shown that the TAR format gives better performance. That said, he wasn't clear about phar's use case, and didn't think Web applications were its target. Dmitry backed Andi, writing that he'd also prefer 'something more toolable (like TAR or ZIP)'. Greg Beaver, whose tests had shown just the opposite, wanted to know exactly how they'd benchmarked performance, but no immediate response was forthcoming.

Greg therefore sat down and wrote quite a long email defending his baby. Phar isn't just an installation stub; it works quite well in a Web environment and, due to the lack of stat() calls needed, it's even possible that applications running directly from a phar archive will eventually be faster than their filesystem equivalent when used alongside APC. Re-implementing phar as JAR would require changes to the Zend Engine because it uses the ZIP file format; re-implementing it as TAR would require restructuring the file for performance reasons, rendering third-party tools useless. Long story short, phar was implemented in the way it was for sound reasons.

Furthermore, there is a cross-platform tool, phar.phar, which is provided as part of the phar extension and which, although not yet feature complete, is capable of viewing, extracting and adding files to a phar archive. Greg would be happy to look at other format implementations, but he'd first need to hear some compelling arguments for change; preferably, arguments that would take into account the work already completed.

Andi clarified. He hadn't intended to offend Greg, he'd simply expressed his opinion. Perhaps Dmitry could send Greg something that would explain why the Zend team achieved better results with TAR? Not that Andi was 'married to TAR'; any other familiar archiving format would be just as viable. Outside of anything else, the introduction of a new format left several unanswered questions about deployment. Where could he find more information? Would mod_rewrite be needed? How would phar work under Windows?

Marcus pointed out that, as far as 'toolable' archive formats go, PHP already has ext/zip in the core; he, for one, wouldn't mind if someone provided a working TAR implementation there too. The purpose of phar was very different. Marcus wondered, too, how the Zend team had obtained their test results; he suspected they must have used an old version of phar. Dmitry heightened that impression when he posted a TAR format variant of phar, in response to Andi's earlier request. He'd written this himself as a proof of concept and had used it in the Zend team's tests, 'some time ago'. He was, however, unable to confirm precisely which version of phar had been used in their benchmarks.

François, of all people, stoutly defended Greg's work. He explained that, in both pecl/phar and his own project, PHK, the TAR format had been rejected because TAR access is sequential. If the Zend team could prove that it was faster to reach the 100th file of a 1Mb archive using TAR, he for one would be prepared to consider rewriting his project to use it. That said, the files would then need to be precisely organized within the archive, which would incur a performance hit. The only option would be to lose the stub file. Losing the stub would also mean losing certain features. For both phar and PHK, one such feature would be the independence from external libraries; another would be the ability to directly include an archive file from a PHP script.

As if that weren't enough, re-implementing either phar or PHK to use the TAR format would result in ''pseudo-tar' files'; archives that could be viewed and extracted using tar itself, but which could be generated only through the extension. Files generated using tar would appear to be the same, but would not work with either extension...

Short version: Greg finally gets François on-side - but is it too late?

TLK: Constant folding optimization [continued]

Following on from last week's discussion with Stas, Nuno Lopes had tried implementing support for "constant expressions within constant contexts." He reported that it was much more complicated than he'd expected; his modification to the static_scalar grammar rule had produced hundreds of reduce/shift ambiguities. Ignoring the compiler warnings led to a version of PHP that couldn't process PHP scripts...

Dmitry went back to the initial proposal, and spotted a problem with the concept. If constant expressions were to be supported, constants within expressions should also be supported - but the values of those constants may be unknown at compile-time. Delaying constant initialization for expressions would entail keeping an Abstract Syntax Tree, which would need to be evaluated at run-time by zval_update_constant().

Stas didn't see why constants should be supported in expressions across the board, but Dmitry pointed out that it doesn't make a lot of sense to support constant expressions and not support constants inside them. Marcus backed Dmitry's observations; he'd looked into this some time ago, and had found that it isn't a trivial challenge. All too often in PHP there are constants that would be enumerated flags in other languages, e.g.

class Week {
    const
Monday = 0;
    const
Tuesday = Monday + 1;
    
// ...
}

or:

class Logging {
    const
INFO = 0;
    const
WARN = INFO + 1;
    const
FAIL = WARN + 1;
    
// ...
}

However, Marcus would like to see support for it, if it proved possible; 'it would make my enum implementation in pecl/spl_types more handy'.

Short version: So that's what they're talking about!

FIX: run-tests.php

Zoe Slattery wrote to the QA list with a problem. The following test script would pass:

--TEST--
Testing regex matching in run-tests.phpt
--FILE--
<?php
echo "Warning: something wrong in function red at line 10\n";
echo
"Warning: something wrong in function green at line 13\n";
echo
"Write whatever you like, it will be swallowed at line 16\n";
?>
--EXPECTF--
Warning: something wrong in function %s at line %d
Warning: something wrong in function %s at line %d

The only way to make the test respond as expected was to put the actual line numbers in the output rather than using %d. Should Zoe now recommend that tests be written with the actual line number as expected output?

Marcus explained why not, but added, oddly, that he'd come across the issue before. It took an attempted fix from Zoe involving extra comment lines before he realized that %s was actually catching new lines and the test suite was in fact broken, at which point he copied the exchange to internals@. Zoe made it clear that regex isn't her strongest point, and Marcus offered her [^\n]+? as a potential solution. It seemed to work, except that several tests now failed that didn't fail before.

Nuno Lopes intervened at this point to propose a patch, explaining that it wasn't actually possible to fix the current regex for %s without breaking everything in sight. His patch offered the choice of %s as the result of matching [^\r\n]+, or %a as the result of a match for .+. Applying it would mean that some EXPECTF strings needed altering from %s to %a, but the more important thing was that when testing the patch Nuno had found test scripts with the wrong expected output. He was therefore keen to get the fix committed.

Johannes Schlüter tested Nuno's patch and was prepared to back it. Zoe, however, was still hopeful of getting the problem fixed without having to rewrite half the PHP test suite, and offered a different solution. She believed that replacing

if (preg_match("/^$wanted_re$/s", $output))

with

if (preg_match("/^$wanted_re$", $output))

would work, since that /s modifier is responsible for the greediness of the match. Then she looked again, and withdrew her patch.

Marcus wasn't sure that Nuno's \r should be in there at all - wouldn't it cause problems to disallow it? - but Nuno mentioned Old Macs. Zoe, though, reported seeing PCRE compilation warnings when she tried Nuno's fix. It took her some minutes to realize these were coming from unmodified test scripts.

Short version: If you're writing scripts for the PHP test suite, take note.

PAT: array_get

Andrew Shearer turned up with a patch against CVS HEAD implementing the array_get() function he proposed a couple of months ago, test script and all. He added that someone had independently posted a similar idea as a feature request for PHP 5, and his implementation could be backported to fulfill that. The original specification from the feature request made up the rest of his email.

Tony Dovgal wanted to know what the difference was, if any, between Andrew's proposal and:

<?php

function array_get(&$array, $key, $default) {
    if (isset(
$array[$key])) {
        return
$array[$key];
    }
    return
$default;
}

?>

Andrew explained that the main idea was to enable people to write cleaner PHP code, since this is - or should be - an oft-used snippet. However, there were some technical differences in array_get(); he referred Tony to the spec. Marcus didn't like it much either; he grumbled that it was inflexible, and 'far away from what ifsetor was meant to be'. It couldn't check whether the array exists; it couldn't manage multilevel queries; it couldn't cope with other types in queries, such as objects. ifsetor() could do all these things. There wasn't the potential to return a writeable reference so that a non-existing key could be created. ifsetor() has that potential. Finally, Marcus disliked the idea of passing the array to the function by reference; it would be very slow, and was unnecessary here.

Andrew agreed with that last point; this wasn't part of his function, just part of someone else's feature request. He'd love to see a workable proposal for ifsetor(), but since there wasn't one, array_get() could help with many of its common use cases. You could, for example, separate the existence checks for array and key with:

$value = isset($array) ? array_get($array, 'mykey') : FALSE;

or use nested calls to array_get() to achieve multilevel queries. Contrary to Marcus' earlier claim, array_get() in fact has support for object members. The writeable reference returned by ifsetor() would be great, if ifsetor() existed, but he reminded Marcus that there has been no concrete proposal for it. Marcus pointed him to the ?: shortcut he put into CVS HEAD as a halfway house for ifsetor(). Andrew had already investigated it, and explained that it doesn't serve the same purpose because it throws an E_NOTICE for missing values. The shortcut could, however, be used together with array_get() to overcome the limitations of both:

array_get($_GET, 'foo') ?: slowDefaultCalculation();

Marcus suggested using @ to silence the E_NOTICE. Robert Cummings pointed out that this would be an expensive assignment; he thought the whole point of ifsetor() was to kill undefined index notices for the left operand. Marcus, however, believed this was the best that could be hoped for in PHP 5.3, although there is still the possibility of a real ifsetor() in PHP 6 if the team ever come to a consensus over it. This came as a surprise to Andrew, who had been under the impression that ifsetor() was rejected long ago. Still, array_get() could provide the functionality that was most often needed, while avoiding the issues that prevented acceptance of ifsetor() - one of which was backward compatibility, since there's no way to write a userland version of it. Writing a BC function to replace array_get(), on the other hand, was a simple matter:

if (!function_exists('array_get')) {
    function
array_get($arr, $key, $default = false) {
        if (
array_key_exists($key, $arr)) {
            return
$arr[$key];
        } else {
            return
$default;
        }
    }
}

Of course, accepting array_get() now would not preclude bringing up ifsetor() again in the future...

Lukas Smith believed it would. He didn't see that anything material was actually preventing an ifsetor() implementation; he uses the kind of functionality it could offer on a daily basis, and as such would prefer it to be an operator and available in PHP 5.3. Marcus' shortcut wasn't helpful to him; Andrew's solution was close enough to solve his real world needs, and 'if I could just see the slightest bit of a real argument against ifsetor(), I might even vote for array_get()', despite his dislike of the name.

Marcus didn't have a new name for it. However, he felt that, if array_get() were to go into the core as a halfway solution it should be made more useful.

Short version: Not the Holy Grail, but a shiny cup nonetheless.

RFC: Multiple namespaces per file

Following community feedback over the namespace implementation, Stas wrote, he and Dmitry had tried to find a simple model that would allow multiple namespaces per file, and would allow multiple namespaced files to be bundled together without modifications. They'd arrived at the conclusion that it was possible, but only if a file containing namespaces contains no code other than namespaced code. For example,

class X {}
namespace A;
class
Y {}

or

require 'foo/bar.php';
namespace A;
class
X {}

wouldn't work, but

namespace A;
class
X {}
namespace B;
class
Y {}

or

namespace A;
require
'foo/bar.php';
class
X {}
namespace B;
class
Y {}

would be fine. The question was, would this be an acceptable solution for those who had wanted multiple namespaces per file? Just to be safe, Stas added a postscript: 'This is *not* a "should we use braces" thread, so please don't :)'.

David Coallier was the first to respond. He wanted to know if it would be possible to have something like an endnamespaces keyword that would act as a delimiter and close each namespace definition. Stas immediately spotted that this was in fact the braces argument without the braces. He reiterated the main problem with it: when several namespaced files are combined, global spaced code in amongst the namespaced code starts to look and act in a 'seriously weird' way. David backed down, and agreed that the solution on offer would actually work well for him. However, he'd like to know whether somefile.php would now be included in the B namespace in the following scenario:

<?php

namespace A
;
require
'somefile.php';
class
AX{}

namespace B;
class
BX{ }

?>

Stas pointed out that somefile.php might well contain a namespace definition of its own; it would be parsed as an entirely separate entity. Besides, require is evaluated at run-time, so cannot influence compile-time namespaces. The namespaces and the included file can't know about each other.

Marcus felt that the whole thing looked 'very messy'; he'd prefer to either go with the curly braces or stay with the single namespace declaration at the head of a file. This implementation lacked clarity, and even if the maze of namespaces panned out as expected you'd need an IDE to get through them.

On the other hand Lukas, who had been one of those requesting multiple NS per file, believed the limitations would be acceptable.

Dmitry explained to Marcus that you can actually spread one namespace across several files, both in the current implementation and in the proposed multiple namespace implementation. Greg blamed Stas' initial explanation for the lack of clarity. The goal wasn't to encourage 'a development paradigm' of multiple namespaces in the same file, but to make it possible to have multiple files containing namespaces combined into a single file. Greg found the proposed syntax difficult to read, but saw this as a good thing; it would discourage development along those wrong lines. If everyone viewed the proposal this way, they'd see it made much more sense; 'It's basically one-namespace-per-file, but PHP allows you to virtually combine files so that at each namespace declaration, the imports are reset.'

Greg was too late to persuade Larry Garfield, who hated the syntax because it doesn't follow the structure of the parser. Larry agreed, however, that mixing namespaced code with global code would be 'all kinds of confusing', however it was managed.

Short version: This is starting to sound like the goto discussions.

TLK: Accessing global namespace

Richard Quadling was confused about global namespaces. If you were operating within a given namespace, and an entity within that namespace happened to share its name with some entity in the global namespace, you wouldn't need to use the namespace prefix for the former. How, then, would you access the entity with the same name existing in the global namespace?

Emil Ivanov believed there was a null prefix. To differentiate between namespaced and global instances of, say:

classB->method_c()

from within a namespace, you'd need to reference:

::classB->method_c()

in order to access the one in global scope.

Marcus concurred, explaining that this solution a) means no new keyword is needed and b) is in line with other languages.

Short version: It's easy when you know how.

FIX: mail.force_extra_parameters

Stas had a proposal to disallow setting mail.force_extra_parameters from .htaccess. He reasoned that the directive allows arbitrary arguments to be passed to the mail binary, and some mail tools will take parameters that allow the reading and writing of arbitrary files. Stas believed that mail.force_extra_parameters should only be altered by the systems administrator in any case, so he didn't see a problem with removing a way to override the system settings. Did anyone have any objections?

There were no dissenting voices, and Stas committed the change into CVS HEAD and PHP_5_2 at the end of the week.

Short version: Another security hole bites the dust.

RFC: Marking functions as const

Following the demise of his patch for constant folding optimization last week, Nuno came up with a new proposal. Some functions, when fed with constant arguments, will always return a constant value - for example,

strlen('abcd') === 4;

In such cases, it would be possible for an optimizer to do the transformation. In fact, he believed that Ilia already had a list of such functions in his own optimizer. Nuno thought it would be better to have that list in the PHP core, so that everyone could benefit from it. All that would be required would be a change from PHP_FE() to PHP_CONST_FE() in the function tables for the affected functions.

Nuno supplied a link to his new patch, which also contains a few function entries that have already been changed.

Hartmut Holzgraefe noted that the example Nuno had given would be called "deterministic" rather than "const". Nuno agreed, but added that his proposed changes shouldn't have any additional side effects.

PHP user Peter Brodersen liked the idea, but wondered if the ability to change the encoding charset at run-time wouldn't break the example? He assumed that it would be important to be aware of settings that could potentially alter the results of individual functions, and use that awareness to decide whether a function really is deterministic at this level.

Short version: It's probably trickier than it sounds.

CVS: Namespace/autoload problem resolved

Changes in CVS that you should probably be aware of include:

  • In ext/mysql, bug 42549 (ext/mysql failed to compile with libmysql 3.23) was fixed [Scott MacVicar]
  • Zend Engine bug #42590 (Make the engine recognize \v and \f escape sequences) was fixed [Ilia]
  • In the CGI SAPI, bug #42587 (behaviour change regarding symlinked .php files) was fixed [Dmitry]
  • Session bug #42596 (session.save_path MODE option does not work) was fixed [Ilia]
  • Alpha support for imagefilter's IMG_FILTER_COLORIZE was added to both libgd and the gd extension [Pierre]
  • Core bug #39651 (proc_open() append mode doesn't work on Windows) was fixed [Nuno]
  • In ext/bz2, bug #42627 (bz2 extension fails to build with -fno-common) was fixed [Jani]
  • PDO bug #42643 (CLI segfaults if using ATTR_PERSISTENT) was fixed [Ilia]

In other CVS news, Dmitry committed his namespace/__autoload() solution into CVS HEAD. In cases of ambiguity, __autoload() will now be called only after checking for the classname both in the current namespace and in the global (internal class) namespace. Marcus, who had just been through a very long discussion with François, Paweł Stradomski and Stut about the differences between classes and interfaces in PHP and the reasons why autoloaders would need to be something completely new to cope with constants and functions, greeted Dmitry's commit with a heartfelt 'nice work!'

Back in the 5_2 branch, Andrey Hristov fixed an apparently unreported ext/mysqli bug he thought had been fixed long ago, which caused Windows builds to throw a warning and leak memory on thread exit.

Andi extended Stas' karma and unleashed him onto the CVS account waiting list. Stas promptly exercised his new rights by giving the entire ICU group access to the unicode module in PECL.

Short version: David Wang still didn't get his CVS account.

PAT: LSB, GC discussions

Etienne Kneuss finally responded to Dmitry's request for a comparison between his most recent late static binding patch and the one the Zend team had come up with. Etienne explained that he was on vacation and didn't have internet access on a regular basis. In response to Dmitry's question about callback support - he'd deliberately left support for callbacks out of his patch because he'd planned to do some clean-up work there.

Etienne had managed to look through the patch Dmitry sent, and noted that the Zend implementation seemed quite different to - and much larger than - his own. However, he had been unable to patch it onto current CVS HEAD. Could Dmitry please post an updated version? Dmitry promptly obliged.

Marcus, meanwhile, was back onto the garbage collection theme. He felt that David Wang's reference macro patch should go directly into PHP 5.3, since binary compatibility would be broken there anyway. His only query was whether it would make sense to use the __ prefix internally, in the same way that it's already used in PHP. Marcus also put in a plea against 'magic switches that lead to broken code.' Wasn't it more normal policy for Zend to try to break new things to find where the problems lay, rather than to try and avoid breakage at the outset?

Cristian Rodriguez got the wrong end of the stick and assumed Marcus was talking about an INI switch for garbage collection, which led him to grumble yet again about 'the unicode.semantics switch thingy'. David Wang put Cristian straight; the refcount manipulation macros are just that, and have nothing to do with garbage collection. In fact, he added, calling the switch ZEND_GC was misleading. He neglected to mention that it's also a compile-time switch.

Short version: A week of confusions.

Comments