Zend Weekly Summaries Issue #344
TLK: Benchmark suite
TLK: Arrays vs stdClass
TLK: Late static binding [again]
RFC: SPL and PCRE always on
FIX: Class comparison
FIX: Persistent MySQL connections
CVS: php_admin holds its own
PAT: Unicode shortcuts
TLK: Benchmark suite
Tony Dovgal wanted to know whether Sebastian Bergmann had done anything further with the refactored version of bench.php he shared with internals@ some weeks ago.
Sebastian explained that, following discussion with Marcus Börger at the time, he'd concluded that the best approach would be to rewrite run-tests.php based on the Iterator approach he'd used for the benchmark system's prototype. Benchmark scripts would then be .phpb, alongside the test suite's .phpt scripts. However, following negative feedback off-list over the idea of "fixing something that isn't broken" (i.e. run-tests.php), Sebastian had stopped work on the project.
The options he could see at this point were:
- Add .phpb style support for benchmarks to the current run-tests.php
- Improve run-tests.php to also collect Peak Memory Usage, store the data in SQLite, and have a test (optionally) fail when it takes more memory than it did in the past
- Do the rewrite as originally planned
Tony wasn't too keen on the idea of requiring SPL for the test suite, but felt strongly that the current version of run-tests.php is 'really hard to maintain and improve'. Perhaps the new version should be made available alongside the old until it's deemed stable. Of the options Sebastian gave Tony much preferred the rewrite, and offered him help with it.
Lukas Smith had missed the original exchange and asked whether this was about testing for performance regression? Either way, he personally didn't feel it should be possible to disable SPL. Tony agreed, but pointed out that this was off-topic. Pierre-Alain Joye thought otherwise, particularly if SPL is likely to be a requirement for the test suite. Perhaps it would make more sense to move the parts of SPL that could be considered as 'core' from ext/spl to the Zend Engine.
Jani Taskinen came close to saying 'something involving smoke and carcass', but ended up agreeing with Pierre that core SPL elements belong in Zend. He was completely against having exceptions to rules; an extension should always be an extension, and any part of it that should always be available belongs in the core.
Short version: The test suite comes under the microscope.
TLK: Arrays vs stdClass
Lukas had never really appreciated all his complex types being returned as
stdClass instances. Although it's possible to override this with
custom classes, he'd much prefer to see 'simple boring arrays' there.
This had become important to him recently because APC doesn't seem to be
optimized for handling objects and, more specifically, appears to be
downright buggy when it comes to handling stdClass. Lukas
referenced an existing bug
report to demonstrate the problem. He went on to ask Dmitry Stogov if it
would be possible to add another option to the SoapServer class
to force it to return an array rather than a stdClass object.
Dmitry thought ext/soap was complex enough already and wrote that he didn't like to increase that complexity with additional options, particularly given that simple object/array conversion brings no gain. Besides, how was this related to APC?
Lukas pointed out that there are many more PHP functions for array manipulation than for object manipulation. Considering the arbitrarily complex structures that can be returned from a SOAP request, a userland cast to array is not only not trivial, but will impact performance. His immediate issue was that he wanted to cache SOAP responses in APC; it seemed buggy to him, and at the very least would require serialization, which again would add overhead.
Dmitry pointed out that changing his API to use arrays rather than objects
would break existing applications, and supporting both forms would increase
complexity and impact performance. He suggested that Lukas simply split the
SOAP reply into simple elements and cache those, rather than the full object,
agreeing that for a general solution he would need to use
serialize(). Despite Lukas' qualms, Dmitry noted that
unserialize() is much faster than a SOAP request.
Stas Malyshev thought an array would probably require serialization anyway,
unless APC is able to store arrays in their pure form? Rasmus Lerdorf
explained that APC is in fact able to memcpy() arrays directly,
leaving Stas to wonder how APC deals with internal hash pointers, which can
only be valid for the current run. Lukas, meanwhile, posted a blog entry with micro-benchmarks
of all the options he could see open to him and asked for feedback, as he was
unsure of his results.
Short version: Objects 1, Arrays 0.
TLK: Late static binding [again]
Picking up where last week's discussion ended, Jochem Maas challenged Bart de Boer's assertion that PHP wouldn't know which of an object's children to target where there is no child class instance. Surely, the child class would refer to the class named in the method call; isn't that the whole point of late static binding?
Jochem followed up by suggesting callee:: or even
lsb:: as a reserved word ('that would make people hit the
docs for sure :-)') before wondering if it might be better to perform
late static binding on self:: and lose back compatibility. He
wondered how much code this would actually break, given the awkwardness of
the code it would be breaking...
Etienne Kneuss simply hoped that dynamic access of a class method, member or
constant would be allowed; he saw no reason to restrict late static binding
to static callstacks. He didn't like the idea of changing the behaviour of
self:: - not only would it break BC, it would also entail a
performance hit.
Bart, meanwhile, thanked Jochem for clarifying the concept of late static
binding for him. He had been going on the assumption that it had something to
do with accessing static variables in child classes. That said, he
didn't think defining a static function in the child class would be a very
efficient approach when it came to complex objects that need to be accessed
frequently. His misconception should probably become a separate feature
request. As for keywords, Bart's fresh understanding led him to suggest
derived:: or extended:: as potential candidates.
Etienne explained gently that late static binding could also be used to
access static properties, and also mentioned why introducing a new keyword is
a Bad Thing ™; what happens when someone has the function
derived() in their codebase? He was fine with re-using
static::, even allowing for potential confusion. Bart didn't
quite get it, reporting that a Google code search of "function derived()"
gave zero results.
Short version: The core team agreed on static:: some months ago. Finding a sane implementation is a different matter.
RFC: SPL and PCRE always on
Following the exchange over the proposed benchmark suite, Marcus posted an RFC whereby both ext/pcre and ext/spl would become 'first class core components' that cannot be disabled. He provided a long-ish list of points for and against doing so, and asked for comments.
Pierre promptly replied that he failed to see what several SPL functions
(namely class_implements(), class_parents(),
iterator_count(), iterator_to_array() and the
spl_autoload_* functions) actually do outside the Zend Engine.
He saw ArrayObject as not only 'a powerful way to work with
arrays', but as an essential fix for issues with object properties and
arrays. Iterators are more usually integral to a language; why should they be
maintained in an extension in PHP? Finally, Pierre saw Marcus' comment about
disallowing his code into the Zend Engine on licensing grounds as a reason to
worry about SPL.
Stas wrote simply that he didn't know yet of any reason why the two
extensions need to be enabled, as opposed to having the option
to be enabled. Was there anything vital that fails without them? Pierre
reiterated his point about ArrayObject in particular, alongside
the rest of the SPL functions already listed that implement missing elements
in the Zend Engine. Stas asked him to elaborate about those 'array bugs in
object properties', and Pierre pointed him to a
blog post he wrote some six months ago on the matter. Having read it, Stas
still didn't understand where the bug lay. If __get() doesn't
return by reference, it can't be modified - that's how it's supposed
to work. If it returns by reference, it works anyway. Pierre suggested that
Stas check the bug database, where there are 'numerous reports about this
problem', and someone named Frode Moe backed him with a link to his own bug report on the
issue. Pierre explained to Stas that it's far from obvious that the current
behaviour is correct; therefore it is not correct. However, his point had
been that the only workaround - which he regards as a core feature - is in
the SPL extension rather than in the core. Stas remained unable to see any
problem, and was therefore unconvinced of the importance of
ArrayObject.
Marcus clarified things for Stas; the inability to change the function
signature was the perceived problem. However, he was with Stas on this issue,
explaining to Pierre that ArrayObject is neither a core feature
nor something that belongs in the Zend Engine. A better solution for the
problem described in Pierre's blog would be to use proxies - (object,
property) or (object, array index) - or perhaps another interface such as
ArrayAccessByRef, or perhaps another interface for
IteratorByRef with a &__current() method might be
interesting. Proxies could be written in userspace code, if it were possible
to overload __get() and __set(). He was thinking of
doing something along those lines in pecl/spl_types; perhaps Engine
support for get/set overloading would come later. Such interfaces would
clearly belong to the Engine, as do two of the functions on Pierre's current
list, namely class_implements() and
class_parents(). The iterator_* functions and
spl_object_hash() are the OO equivalent of functions in
ext/standard and have nothing to do with Engine internals, but the
right place for the spl_autoload_* functions is open to
discussion.
It had become clear to Marcus that he and Pierre had completely opposing
views about the role of the Zend Engine. He believed there should be better
separation between code and the scripting engine; he'd rather move things out
of the Engine than into it. Pierre, on the other hand, seemed to want
essential core elements and anything influencing them to be part of the
Engine itself. To Marcus, the problem with this approach is in knowing where
the limit lies. Finally, regarding his comment over licensing, many other
things had remained separate from the Zend Engine for that reason in the
past. Error reporting and spprintf(), for example, had resulted
in 'a bunch of function pointers inside the engine', and TSRM remains
completely separate.
Pierre argued again that standard functionality should belong in the core of the language, and Marcus reiterated his own view that SPL only facilitates aspects of PHP that the Engine allows. Everyone else had moved on to the question of PCRE. Ilia Alshanetsky argued that 'other, rather important extensions' depend on PCRE, such as ext/filter. The same could be said for SPL. Did this count as one of Stas' "need" reasons? Stas replied that it's possible to run PHP without ext/filter.
Alexey Zakhlestin thought mb_ereg() (powered by the
oniguruma library in ext/mbstring) a strong contender when it
came to regex options, particularly given that it's faster than
ext/pcre. Marcus pointed out that the decision to obsolete
ext/mbstring in PHP 6 had already been made, and also that the PHP
team have no control over the license there. Alexey saw this as a challenge;
perhaps he should write ext/oniguruma, and keep it completely separate
from mbstring.
Derick Rethans finally put in his vote for effectively making both SPL and PCRE part of the core, pointing out that this has pretty much been agreed for PCRE in PHP 6 in any case. Jani contented himself with double-checking that SPL/PCRE dependency will only occur in PHP 6, and not in any version prior to that.
Short version: Definitions of 'core' and 'engine' need a review.
FIX: Class comparison
One Mark Sanders wrote to internals@ to highlight 'an odd error message'. He'd been able to trigger the fatal error message "Nesting level too deep - recursive dependency?" across three separate PHP_5_2 versions using the following code:
<?php
|
Although there obviously is recursion there, Mark thought this particular
example should work without error. Tony disagreed, pointing out that the
object comparison in it would lead to endless recursion. Richard Lynch didn't
see why, but Paweł Stradomski explained the difference in behaviour
between a shallow object comparison (same instance) and a deep object
comparison (member by member), which he believed had been introduced in PHP
5.1. Mark replied that the same error is triggered under PHP 4. He went on to
point out to Tony that, although he'd admittedly fallen across the behaviour
when doing 'something very odd which would not work anyway', the code
shouldn't give an error at all in theory. He was also able to trigger the
error by comparing $this with itself in his test()
function, or by comparing either class with itself. Tony asked if he had any
suggestions for improvement.
Robert Deaton repeated Paweł's point, that == will test
every property in the object:
$this == $this->c1->c2 will compare $this->c1 == $this->c1->c2->c1, which will in turn compare $this->c1->c2 == $this->c1->c2->c1->c2, which in turn compares $this->c1->c2->c1 == $this->c1->c2->c1->c2->c1
Tony wrote much the same, only less clearly. Daniel Penning pointed out that it might be a good idea to do a shallow comparison before the member-by-member comparison, and this would prevent deep recursion in most cases. Mark had already made the same suggestion to Tony off-list. Christian Schneider thought it wouldn't solve the general problem, but might be worth doing. Tony was surprised to find the check wasn't already there, and promptly added it.
Short version: Comparing like with like doesn't break anything any more.
FIX: Persistent MySQL connections
One Peter Christensen wrote to internals@ to complain that he has been
struggling 'for several years now' with the warning message "x
is not a valid MySQL result resource", which seemingly pops up only a
few times a day on his very busy site. Peter had spent some time trying to
track down the source of the error, and had verified the correctness of the
code. He had eventually been forced to ignore the problem due to its rarity.
Recently it had become inexplicably less rare, to the extent that his boss
had granted him indefinite time to spend on resolving the problem. He had now
tracked it down to the active_result_id field within the
php_mysql_conn struct. This field wasn't cleared when a
persistent MySQL connection was pulled from the list. If another query was
performed on a connection that happened to have been given the same resource
id, the original query was freed.
Peter went on to give a
detailed analysis and code sample to reproduce the problem, which showed
that the guilty field was only cleared during
php_mysql_do_connect() when the connection was not persistent,
or when a persistent connection was not found. He attached a patch to fix the
issue by always clearing the field at that point, adding that the bug affects
all versions of PHP.
Tony checked with MySQL maintainers Georg Richter and Andrey Hristov before committing the patch. There were no objections.
Short version: Persistent connections just got a bit more reliable.
CVS: php_admin holds its own
Changes in CVS that you should probably be aware of include:
- In ext/json, bug #41567
(
json_encode()double conversion is inconsistent with PHP) was fixed [Ilia] - In ext/oci8, bug #41594 (Statement cache is flushed too frequently) was fixed [Tony]
- In ext/simplexml, bug #41582 (SimpleXML crashes when accessing newly created element) was fixed [Tony]
- A new macro,
SET_VAR_ASCII_STRINGL(), was introduced in CVS HEAD [Tony] - In ext/pdo, bug #41596 (Crash inside pdo_pgsql on some non-well-formed SQL queries) was fixed [Ilia]
- Core bug #41600 (url rewriter tags don't work with namespaced tags) was fixed [Ilia]
- Zend Engine bugs #41608
(segfault on a weird code with objects and
switch()) and #41561 (Values set withphp_admin_*in httpd.conf can be overwritten withini_set()) were fixed [Tony] - In the GD library, bug #41630 (segfault when an invalid color index is present in the image data) was fixed [Pierre]
- An earlier fix for bug #41504
(
json_decode()incorrectly decodes JSON arrays with empty string keys) was merged to CVS HEAD [Tony] - Zend Engine bug #41640
(
get_class_varsproduces error on class constants) was fixed [Johannes Schlüter]
In other CVS news, Dmitry complained that the test suite was completely broken after Jani made it possible to pass shared extensions to run-tests.php. PHP now tried to load the extensions twice, once from the INI file and once from the command line. Dmitry liked the basic idea, but wrote that 'it doesn't work yet'. Jani replied that he'd been afraid this would happen, and asked whether there were no way to suppress the warnings.
Marcus vetoed that approach, and suggested that there should be a simple check for loaded extensions. Any module already loaded would then be excluded from being re-loaded. Jani had a better idea; duplicate entries shouldn't be allowed in main/php_ini.c. Either that, or perhaps there should be a cleanup of duplicate module entries in the Zend Engine?
Dmitry vetoed that approach; he didn't like to modify PHP code to
support duplicate extension directives, particularly since this didn't solve
the twin problem of INI directives trying to load non-existent modules. He
suggested making a copy of php.ini without any extension-related
directives, and using it in preference to the original when running tests.
Jani agreed that this simple solution was better, and later added a new
function, php_ini_loaded_file(), which returns the path to the
php.ini currently in use. He then used it to fix the test suite
issues, noting that although it would have been possible to do this without
adding a new PHP function, it definitely made it easier. Dmitry - rather late
in the day - asked Jani to talk to Marcus, who had raised some objections
against this solution off-list.
Nuno Lopes followed up a commit from Ilia by applying a fresh copy of the
generated source file url_scanner_ex.c to CVS, with the message
"fix gcov build for the 100th time". Ilia asked him precisely what the
gcov build needed that his re2c wasn't doing, and
Tony explained that it needs full paths rather than just filenames. Jani and
Nuno both wrote that this should happen automatically during make
test; the problem only arises when someone runs re2c
manually. Marcus, however, pointed out that there are some .re files
in the PHP source tree that are only generated manually, for example in
ext/date and ext/pdo. Jani wondered aloud why there are no
build rules for these?
Short version: There are now two ways to get the path to the current php.ini.
PAT: Unicode shortcuts
Tony proposed a small patch adding two new command line switches to CVS HEAD:
-U - turn Unicode mode on -N - turn Unicode mode off
Although it is of course already possible to type -d
unicode.semantics=1 or -d unicode.semantics=0,
Tony would prefer to have shortcuts available in both the CLI and CGI
SAPIs at some point, assuming there were no objections.
Pierre suggested a slightly different syntax, introducing a single shortcut rather than two:
long: --unicode (0|off)/(1/on) short: -u (0|off)/(1/on)
but there was no further feedback, and the patch appears to have been forgotten.
Tony applied an ext/oci8 patch from Christopher Jones. It enables the statement cache for non-persistent Oracle connections.
And finally, Wietse Venema offered up a 30-line make depend script
that he'd 'morphed from Perl' into PHP. He added, 'If I overlooked an
already existing way to minimize PHP build time, without the risk of using
out-of-date .lo files, then you can ignore this post.'
Short version: Don't ignore Wietse's post. Or Tony's CLI switch.

Comments