Zend Weekly Summaries Issue #363

      Comments Off on Zend Weekly Summaries Issue #363

TLK: Signature overloading [continued]
REQ: Lars
PAT: Large file support
TLK: A boost for Solaris?
TLK: Exception policy
TLK: T_IMPORT vs T_USE [again]
TLK: Object arithmetic [continued]
NEW: PHP 5.2.5 RC1
TLK: Class resolution
CVS: Freeze
PAT: OnUpdateUTF8String

14th October – 20th October 2007

TLK: Signature overloading [continued]

Marcus Börger, back from wherever he’d been, caught up with Hans Moog’s
thread last week
about method overloading by signature. He wrote simply that if Hans had such
a patch he should post it so the team could take a look, and added that the
main reason return type hints had been turned down was the performance impact
they bring. Hans explained that he’d merely wanted to test the waters before
going to the trouble of adapting the patch to apply cleanly in CVS. His own
team’s benchmarks showed that method calls were roughly 0.1 % slower with the
patch, but they’d found real-world usage faster overall because it enabled
them to lose the overhead of custom checks for every input field.

Alexey Zakhlestin was still arguing with Hans about whether the behaviour he
was trying to introduce was backward compatible. Hans retorted that signature
checking in PHP 5 is under-used because PHP doesn’t allow method overloading.
The current state of affairs is an unhappy mixture of strict OO principles
and loose typing, and there’s no way to use them together. It became evident
during the course of this exchange that Hans saw signature checking as a
low-impact security measure…

Christian Schneider wondered aloud whether he was ‘the only one here who
thinks that performance is not the major issue with this approach?
‘ He
was very much against encouraging the style of programming that would be used
alongside enforced typing expectations of any kind, and saw support for the
feature as bloat at best. At worst, it could lead to buggy code, since PHP’s
automatic type juggling can spring surprises. Christian imagined having to
check every piece of input data repeatedly to guard against those surprises.
He didn’t like that image much. Hans argued that he wouldn’t need to do this
unless he wanted to force a strict API; he could code ‘the old way
and check the parameters manually.

Rasmus Lerdorf agreed with Christian that ‘having to sit and count
arguments and figure out the types in order to determine which actual method
is being called
‘ would make debugging virtually impossible, and mentioned
that kcachegrind et al wouldn’t be of much use. Hans felt that
kcachegrind et al would adapt, and argued that anyone using type
hints in PHP on a regular basis would understand the need for overloading type
hinted methods. Rasmus pointed out that the code he usually profiles and
debugs is written by other people. Besides, the Web itself is
inherently untyped, ergo PHP deals intelligently with untyped data all the
time.

Christian wrote that ‘the old way’ doesn’t involve parameter checking in any
case. Automatic type conversion does the right thing 99% of the time, and
only the rest is handled (not checked) manually. He didn’t believe that type
hints make applications more robust, and he didn’t see anything wrong with
using different methods when different signatures were needed; outside of
anything else, it’s much easier to debug. An API should always be simple, and
a simple API shouldn’t need ‘whips and chains‘.

Stas Malyshev pointed out that the purpose of signature overloading in other
languages is not input type control, and explained that the
performance hit arises because PHP doesn’t have static typing. Any
calculations would need to be done at runtime. This in turn would make the
code less transparent, i.e. debugging and maintenance would be more complex.
Stas wasn’t convinced that the added complexity would serve a real need,
although Hans was welcome to prove otherwise. He also made the point that an
unchecked wrong type in the input is usually enough to make a PHP application
die with a fatal error anyway; checking it would only make it die a couple of
lines earlier. He didn’t understand why Hans would use strict signatures at
all if he didn’t want typed arguments, and using inheritance for overriding
objects with incompatible signatures was simply wrong. Typed arguments are
widely used, in Stas’ experience, but if Hans believed parameter typing would
free him from the need for input validation, ‘IMHO you are on a wrong
way.

Having read Hans’ posts, Marcus implicitly agreed with Stas’ verdict (don’t
worry, regulars, it was only implicit). He also queried Hans’
benchmarking, given that anything under 2% is close to the measurable limit;
you’d need to count assembler instructions using callgrind to
come up with a figure like 0.1%. Finally, the whole point of type hinting in
PHP was to help developers generate more helpful error messages; they were
primarily an aid for the exception model, and certainly not a replacement for
input validation.

Lukas Smith summarized the entire team’s response in his own: ‘This
feature isn’t PHP… or rather, it’s a solution to a problem that does not
exist in PHP… for good reason.
‘ Richard Quadling rather ruined that
end-of-thread moment by announcing that he’d find signature loading
extremely useful and thought it a great feature.

Someone named Umberto Salsi wrote to inform everyone of the existence of PHPLint, ‘a PHP parser and validator
that performs a static analysis of the source, ensuring the safe handling of
types
‘. It seems Hans was already familiar with the tool; he agreed that
it is very useful when you want strict typing. However, he’d prefer to be
able to easily overload functions, since this is one of the basic functions
of an OO programming language. Still, if only he and Richard liked the idea,
he’d drop his request. PHP users Daniel T. Gorski and Ken Stanley promptly
wrote that no, they liked it too. Stas contented himself with a note that
PHPLint seemed to him ‘a better solution than messing with the
language
‘, and Marcus was impressed enough to suggest that Umberto
showcase the tool at future conferences.

Short version: The ensuing ‘my code is better than your code’
exchange ran for another 20 posts or so, but I think you’ll get the
picture.

REQ: Lars

Short version: Somebody give that man an account quick!

PAT: Large file support

Wez Furlong aroused comment when he posted a patch to fix an antique
bug, #27792 (Functions fail on large
files (filesize(), is_file(), is_dir()),
by “promoting” file sizes that overflow LONG_MAX to
double rather than long in various file manipulation
and streams functions. He noted in his message that platforms other than Linux,
Solaris, FreeBSD and OSX might need different CFLAGS to work correctly.

Debian’s Sean Finney wondered why Wez didn’t just use SIZE_MAX
instead, and recommended using getconf LFS_CFLAGS to pick up the
flags on ‘other (weirder) platforms‘ – always assuming
getconf exists there.

Security expert Stefan Esser looked over the patch and noted that compiling
PHP with large file support breaks binary compatibility, since one of the
globals has a differently-sized stat struct depending on whether
LFS is enabled or not. Sean backed him, adding that this would be a particular
problem for anyone using proprietary software built against the original API.
RedHat’s Joe Orton pointed out that the patch would also change the API of
any library or application linked against PHP, causing PHP to see all
instances of off_t as 64-bit; ‘this breaks structure offsets
etc very badly
‘. He recommended using 64-bit specific defines and types
throughout to avoid the issue.

Stas simply asked Wez whether the issue wouldn’t be better resolved at stream
level rather than at function level. He also had qualms about the move to
size_t, feeling that it might be ‘rather dangerous‘ to
change binary structures. Wez explained that his patch actually does
resolve the issue at stream level; this hadn’t been clear in his post, which
had stressed that some extensions may need to adapt certain functions. The
patch turns on LFS in the headers, which in turn promotes the
off_t and size_t types used by streams to the
64-bit versions. The rest of the patch was all about adapting built-in PHP
functions to make them capable of coping with numbers that are too big to fit
into a long. Wez agreed with Stas about the changes to binary
structures; that was exactly why he’d written the patch against the new
development branch and not the PHP_5_2 stable branch.

Sean still had some concerns, but these proved to stem from his lack of
familiarity with PHP internals (and/or to cast some light on Debian
internals, depending on your way of viewing the world).

Stas finally got around to looking at the patch itself, and suggested that it
might be better to have the defines in php-config.h (read: +
.in) than to put them in CFLAGS. That way, anyone
including PHP headers would at least be guaranteed to always get the same
result on the same platform, preventing incompatible module builds. He wanted
to know, though, if the defines would guarantee the same results across all
libc versions?

Wez concurred over the need to move the defines; he’d assumed that
CFLAG defines would trickle down to php-config.h, but
apparently not so. However, he neglected to answer Stas’ question about the
effects of different C libraries.

Short version: It needs thorough testing before it has a hope.

TLK: A boost for Solaris?

First off, Rob Thompson wrote to internals@ to say that he’d been thinking
about Solaris support for PHP and the problems of reproducing bugs under
different Solaris versions. The University that employs Rob uses both PHP and
Solaris heavily, and Rob had spent time trying to improve matters in
consequence. He wanted to know whether there would be any interest from the
PHP development team if he were to set up a series of Solaris systems running
different versions and give them access to those systems for testing/debugging
purposes.

Someone named Brian A. Seklecki recommended that Rob go to Pkgsrc for help with Solaris instead, but
Rob made it clear that he actually wanted to help improve PHP if humanly
possible. Nuno Lopes wrote that he’d be interested in Rob’s setup if he could
convince his University to support it; his own University has been
discontinuing their Solaris servers, and he only has access to ‘an ancient
Solaris 7 server
‘. He’d be even more interested in seeing Rob’s existing
bugfix patches.

Second up, a Jean Jayet from Sun Microsystems wrote to say that his team are
integrating PHP 5.2.4 into their new Open Solaris release. They had run the
PHP test suite there and on a RedHat 4 system for comparison, and had –
unsurprisingly – found more failed tests under Solaris than under RH. Jean
wanted to know if there are test results grouped per OS anywhere on php.net;
he hoped to find out the status of each failed test. Derick Rethans sent him
to the QA mailing list, but noted
that results aren’t actually grouped there. Jean looked, but found it
unhelpful; he couldn’t easily find anything specific to Solaris and didn’t
have time to plough through all the posts. He also noticed that failed tests
could vary, and wondered if there were some way to know which tests should
always pass. Derick explained that this would include all failing
tests, since those not applicable to the configuration are skipped.

Jean asked next what action he should take when faced with failing tests. Zoe
Slattery pointed Jean to the bug reporting system and asked him simply to
check that the bug report didn’t already exist before posting the relevant
.diff and .out files. Tony Dovgal mentioned that it is also
possible to send a patch when there is obviously a missing workaround for
Solaris-specific behaviour, or to report “wrong” tests to the php.qa mailing
list. Nuno went one better and introduced Jean to the gcov page, pointing out that this was
actually the ‘reference’ build Jean had hoped to find. Marcus wrote that it
would be good to set up a database to collect information from trusted
gcov machines; Nuno replied that this is on his TODO, but he has
no time to do it. That said, Nuno asked whether Sun Microsystems would be
interested in setting up a gcov machine to contribute to this
when he does find time. Jean thought so, given that Sun would be
interested in sharing their PHP 5.2.4 results on Open Solaris (SXDE). Jean
also posted their test results and asked for help in analyzing the reasons
for the existing test failures prior to posting any bug reports. This
effectively ended the thread; there are no resources other than those already
mentioned, and only Jean knows the SXDE platform.

Short version: A lot of ‘maybe’s in there…

TLK: Exception policy

Lukas Smith wanted clarification regarding the policy of throwing exceptions
from the PHP core. As he recalled it, the original pre-PHP 5 decision had
been that core exceptions should only be thrown on constructor errors, and
extensions should allow users the option to explicitly enable an exception
mode. This had come up now because he’d noticed that the new OO
imagick extension is ‘quite exception throwing happy‘; although
imagick is in PECL and not part of the core, Lukas felt that the
question of policy should be addressed. Had he remembered correctly? Was this
still the policy? Was there even a policy, and if not should there be one?
Either way, there should be something about this in the coding standards
file.

Derick believed the original policy should hold, but Tony saw nothing wrong
in using exceptions throughout OO extensions. Moreover, he disliked the idea
of having the error type controlled by a per-extension function, and didn’t
feel this should be part of any coding standard. Johannes Schlüter
agreed that having the option to alter the error mode at runtime complicates
code, and felt that the switch should be avoided in future. Lukas agreed that
it could be problematic, but defended the decision to have such a switch in
PDO, which was built this way from the ground up.

Larry Garfield promptly wrote that he’d had issues in the past with
non-exception error mode handling in PDO, but Lukas pointed out that wrong
error values indicate a bug rather than a feature. Larry explained that he
hadn’t been able to make a viable bug report from the code he’d been using.
He’d found that the exception based checking made the code much nicer to deal
with in any case, and saw the error mode – with its variable error array size
– as flawed. Whatever. Larry’s main thrust was that exceptions are popular
precisely because they provide a clean and powerful mechanism; but he agreed
that it should remain possible to use PHP without having
try/catch blocks scattered through the script.

Short version: Inconclusive.

TLK: T_IMPORT vs T_USE [again]

Sebastian Nohn chased the Zend team for the results of their endeavours to
support the use of reserved words in method names, function names and class
names. Stas responded with the news that they had a simple patch to allow
this for import, but that it was still undergoing testing. Andi
Gutmans noted that applying the patch would mean breaking tokenizers and
syntax highlighters; they’d need to actually parse the keyword to gain
context. For this reason, he was beginning to think it might be best to stick
with the rule that a keyword is always a keyword. In turn, this would mean
dropping import and adopting use.

Pierre-Alain Joye pointed out that this would also mean keywords can’t be
added in minor PHP versions, and added – in case anyone had missed it – that
having import as a keyword would break a lot of existing
applications. He’d rather have Stas’ patch or move to use, which
returned fewer than ten Google codesearch results. Johannes mentioned that
use has been a reserved keyword in PHP 5 since inception. Then
again, without the patch the keyword namespace still might break
existing code. Pierre hadn’t realized that use was reserved. He
wondered, in that case, what was holding up the switch? The
namespace keyword was far less problematic than
import, being used mainly where PHP 5 provides a native
solution.

Stas thought it a good moment to mention that the patch in question
only supports the import keyword, before anyone got
carried away. Marcus wondered why import had even been
considered, given that use had been reserved as a keyword long
ago. And anyway, package was a better name than
namespace… Stas pointed out that reserving
package wouldn’t resolve any of the problems under discussion; a
lot of applications use it as an identifier, PEAR not least among them.

Short version: T_USE wins hands-down.

TLK: Object arithmetic [continued]

Stas picked up on Rob Richards’ query from last week about
inappropriate typing. He wrote that he wasn’t sure how to fix this; there
should be some type used, and many objects wouldn’t provide a means of
conversion to float. Stas thought it better to explicitly cast
the type, since relying on the default conversion was ‘probably not the
best idea
‘ in the context of arithmetic operations.

Alexey wondered why SimpleXML couldn’t be forced to cast to
string, leaving the Zend Engine to cast to float.
Stas pointed out that the addition operator requires numeric values.

Rob, ignoring the diversion, wrote that he would have agreed with Stas if it
weren’t for __toString(), and threw him the following example of
the current deficiency:


<?php

$xml =
'<root>1.12.2</root<';
$sxe
= simplexml_load_string($xml);
print
$sxe->a ." + ".$sxe->b." = ".($sxe->a +$sxe->b);
// 1.1 + 2.2 = 3

?>


Besides, Rob didn’t see why objects that happily provide a long
converter shouldn’t provide a float converter.

Short version: Just when you thought all __toString() related
issues were resolved…

NEW: PHP 5.2.5 RC1

Ilia Alshanetsky, as Release Master for the PHP 5.2 series, announced the
availability of PHP 5.2.5 RC1 for testing:

Short version: Keeping the stable version stable is just as
important as the bleeding-edge stuff. Please test.

TLK: Class resolution

Chuck Hagenbuch sparked a long and convoluted discussion when he complained
that class resolution is wrongly ordered when autoloading namespaced code.
Chuck had three files (attached
to his mail
). The demo.php file includes test.php directly and
test_exception.php through its __autoload() function;
both files declare the namespace Test. When
Test::Tester->fail() is called from the demo file, the
Tester class should throw a Test::Exception – in
theory. In practice, demo.php resulted in a fatal error:
Uncaught exception 'Exception' in test.php:7. Chuck reasoned
that __autoload() was never called, and the class resolution
order must be:

  • does the class exist in the current namespace?
  • if not, does the class exist in the global namespace?
  • if not, try autoload

He believed the autoload should be more finely tuned to avoid having to
explicitly load any redefined global classes:

  • does the class exist in the current namespace?
  • if not, can the class be autoloaded with the current namespace?
  • if not, does the class exist in the global scope?
  • if not, try global autoload

Another way around the problem might be to force users to prefix all global
classes with :: within a namespace; since back compatibility is
already broken with namespaced code, retaining BC isn’t really an issue.

Greg double checked; was Chuck suggesting namespace specific autoloading? If
so, it sounded as if it could be overly complex. He could see another
solution without the need for that:


<?php

namespace Test;
import
Test
::Exception;

class
Tester {
    public function
fail() {
        throw new
Exception();
    }
}

?>


Greg recognized that this wasn’t totally intuitive, but felt that an explicit
declaration of any external classes at the top of the file would be a big
bonus from the maintenance perspective.

Chuck explained that he’d been thinking in terms of
A::B::C::Exception vs ::Exception; he hadn’t
thought about the possibility that A::B::Exception might also
exist, and agreed with Greg that this would quickly get ugly. With that out
of the way, he didn’t have any better ideas than Greg’s userland workaround
for the time being, but observed that import combined with
autoloading would affect all subsequent included files rather than just the
immediate file. Perhaps all class names not prefixed with ::
should refer to classes within the current namespace, with no fallback. Greg
concurred that this would be liveable in his opinion, but didn’t have time to
back it up with a patch.

Stas caught up with the conversation at this point. He confirmed that Chuck’s
assumptions about the autoload ordering were essentially correct, but pointed
out that the four-step autoload checks he’d envisaged would make it harder to
use global classes (and therefore harder to convert older applications to use
namespaces). Stas explained the rationale behind the ordering. It was assumed
that the global Exception class was likely to be used more frequently than an
extended Exception class; the fact that an extending class could be called
Fred and still work had also come into it. The resolution rules
were therefore designed to make it easy to work with the most frequent use
case, and straightforward to migrate existing code. Stas saw the ‘always
prefix’ idea as a non-starter, given migration considerations; ‘the use of
global classes is ubiquitous in current applications
‘. Basically, the
problem wasn’t that namespaced code wouldn’t work with earlier PHP versions.
The problem was that older code needed a simple upgrade path. Stas initially
wrote that it would be better to use require_once where internal
classes are overridden, but then saw Greg’s solution and decided it was
better. Far from being non-intuitive, it was immediately clear from the code
exactly which exception class would be used.

Chuck wasn’t at all sure about the idea that the standard global
Exception class is the one most used in PHP, and cited the Zend
Framework to illustrate his point. In his view, ‘calling exception classes
something other than Exception is kind of like giving up on the
namespace
‘. That said, he agreed that Greg’s explicit import solution
looked like the best approach – when it came to dealing with global
classes. The problem with using it for anything else was its propensity for
propagation. It also didn’t make a lot of sense to Chuck to have to import
pieces of the current namespace. Stas didn’t see Chuck’s point about
propagation, arguing that the name resolution for a file is defined by the
code and import declarations within that file and ‘import is
just a way to write names shorter
‘. He also believed that the
Exception class itself is a rare case, in that it’s unusual to
write code where the class names coincide with internal class names. All in
all, Stas didn’t see ‘masking system classes with overloads and allowing
only :: to work
‘ as a good solution. Outside of anything else, the
:: prefix would lead to an autoload call on each use of an
internal class, overridden or not, which would significantly impact
performance.

Chuck realized that Stas hadn’t run the test code he’d posted and started
again from the top, posting some
more code
to prove his point. The implementation as it stood didn’t work
the way Stas had said it did; the next file to be loaded would inherit an
imported Exception override, without explicitly importing it
itself. Chuck added a heartfelt plea for full control over namespaced
elements, pointing out that he currently has the choice of listing every
class used at the head of each file or being at the mercy of any new global
class that comes along. Although he understood the argument that migration to
namespace usage should be simple, he felt that there should be some
encouragement to think about the implications. Everything in a namespaced
file should be relative to that namespace, and any global element explicitly
imported, in exactly the same way as variable scope works in PHP. Chuck
almost ruined this moment of pure poetry by adding that he ‘didn’t have a
grand theory of why variable scope and class scope should match
‘. He just
thought most people would want namespaces to be as self-contained as possible,
and that would mean explicitly importing external, rather than internal,
elements into a namespace.

Chuck evidently felt rather passionate about this, and a number of emails
went back and forth between himself and Stas. DateTime was of
course mentioned in passing, since Stas insisted that clashes with internal
class names weren’t and wouldn’t be an issue. Chuck went as far as
threatening to write a patch himself (‘scary as it might be for
others
‘) to force external dependencies to be declared at the head of a
namespaced file. He didn’t get that far, though; he’d tried Stas’ suggestion
of explicitly importing classes in the same namespace, and reported that this doesn’t work. Surely
the whole point was that PHP should be able to add new classes, and third-party
code should be loaded, without affecting namespaced code?

Short version: Something’s not right.

CVS: Freeze

Changes in CVS prior to 5.2.5RC1 that you should probably be aware of
include:

  • Zend Engine bug #42859
    (import always conflicts with internal classes) was fixed in 5_3
    and HEAD [Greg Beaver, Dmitry]
  • Core bugs #42919 (Unserializing
    of namespaced class object fails) and #42722 (display_errors
    setting ignored for E_PARSE and HTTP 500 page) were fixed, the
    latter in 5_2 also [Dmitry]
  • Core bug #43020 (Warning
    message is missing with shuffle() and more than one argument)
    was fixed in the 5_3 and 5_2 branches [Scott MacVicar]
  • In the PHP_4_4 branch only, GD bug
    #43010
    (regression in imagearc with two equivalent angles) was fixed
    [Pierre]

Depending on your choice of RDBMS, the big CVS news of the week was either
the tweaking of the MySQL native driver in the PHP_5_3 branch to allow it
to build under PHP 5.2.*, or the fact that the Firebird extensions finally
have a dedicated maintainer in Lars Westermann. PHP’s only Firebird user,
Lester Caine, should be happier now.

Marcus had a funny half-hour when a CVS checkout fell apart on him. Several
SPL commits to ‘fix brnach (funny broken checkout)’ ensued. Anyone thinking
of searching the PHP bugs database for “brnach” as a result – take it from
one who knows, and don’t.

The only other thing of note was that Lukas now has php-src karma. Who knows,
maybe he’ll be the PHP 5.4 series RM after all…

Short version: Someone gave that man an account quick.

PAT: OnUpdateUTF8String

Christopher Jones of Oracle fame
posted a patch create
by one of their engineers to allow OnUpdateUTF8String() to
work as he thought it should in PHP 6. He asked for feedback about the idea,
but none was immediately forthcoming.

And finally, Dmitry applied a patch offered a couple of weeks
ago
by Bill Moran to the SOAP extension in PHP_5_2, PHP_5_3 and CVS HEAD.
The patch fixes bug #42637
(SoapFault: Only http and https are allowed).

Short version: A bit of a wait, but we got there in the end.