Categories


Loading feed
Loading feed
Loading feed

Zend Weekly Summaries Issue #363


TLK: Signature overloading [continued]
REQ: Lars
PAT: Large file support
TLK: A boost for Solaris?
TLK: Exception policy
TLK: T_IMPORT vs T_USE [again]
TLK: Object arithmetic [continued]
NEW: PHP 5.2.5 RC1
TLK: Class resolution
CVS: Freeze
PAT: OnUpdateUTF8String

14th October - 20th October 2007

TLK: Signature overloading [continued]

Marcus Börger, back from wherever he'd been, caught up with Hans Moog's thread last week about method overloading by signature. He wrote simply that if Hans had such a patch he should post it so the team could take a look, and added that the main reason return type hints had been turned down was the performance impact they bring. Hans explained that he'd merely wanted to test the waters before going to the trouble of adapting the patch to apply cleanly in CVS. His own team's benchmarks showed that method calls were roughly 0.1 % slower with the patch, but they'd found real-world usage faster overall because it enabled them to lose the overhead of custom checks for every input field.

Alexey Zakhlestin was still arguing with Hans about whether the behaviour he was trying to introduce was backward compatible. Hans retorted that signature checking in PHP 5 is under-used because PHP doesn't allow method overloading. The current state of affairs is an unhappy mixture of strict OO principles and loose typing, and there's no way to use them together. It became evident during the course of this exchange that Hans saw signature checking as a low-impact security measure...

Christian Schneider wondered aloud whether he was 'the only one here who thinks that performance is not the major issue with this approach?' He was very much against encouraging the style of programming that would be used alongside enforced typing expectations of any kind, and saw support for the feature as bloat at best. At worst, it could lead to buggy code, since PHP's automatic type juggling can spring surprises. Christian imagined having to check every piece of input data repeatedly to guard against those surprises. He didn't like that image much. Hans argued that he wouldn't need to do this unless he wanted to force a strict API; he could code 'the old way' and check the parameters manually.

Rasmus Lerdorf agreed with Christian that 'having to sit and count arguments and figure out the types in order to determine which actual method is being called' would make debugging virtually impossible, and mentioned that kcachegrind et al wouldn't be of much use. Hans felt that kcachegrind et al would adapt, and argued that anyone using type hints in PHP on a regular basis would understand the need for overloading type hinted methods. Rasmus pointed out that the code he usually profiles and debugs is written by other people. Besides, the Web itself is inherently untyped, ergo PHP deals intelligently with untyped data all the time.

Christian wrote that 'the old way' doesn't involve parameter checking in any case. Automatic type conversion does the right thing 99% of the time, and only the rest is handled (not checked) manually. He didn't believe that type hints make applications more robust, and he didn't see anything wrong with using different methods when different signatures were needed; outside of anything else, it's much easier to debug. An API should always be simple, and a simple API shouldn't need 'whips and chains'.

Stas Malyshev pointed out that the purpose of signature overloading in other languages is not input type control, and explained that the performance hit arises because PHP doesn't have static typing. Any calculations would need to be done at runtime. This in turn would make the code less transparent, i.e. debugging and maintenance would be more complex. Stas wasn't convinced that the added complexity would serve a real need, although Hans was welcome to prove otherwise. He also made the point that an unchecked wrong type in the input is usually enough to make a PHP application die with a fatal error anyway; checking it would only make it die a couple of lines earlier. He didn't understand why Hans would use strict signatures at all if he didn't want typed arguments, and using inheritance for overriding objects with incompatible signatures was simply wrong. Typed arguments are widely used, in Stas' experience, but if Hans believed parameter typing would free him from the need for input validation, 'IMHO you are on a wrong way.'

Having read Hans' posts, Marcus implicitly agreed with Stas' verdict (don't worry, regulars, it was only implicit). He also queried Hans' benchmarking, given that anything under 2% is close to the measurable limit; you'd need to count assembler instructions using callgrind to come up with a figure like 0.1%. Finally, the whole point of type hinting in PHP was to help developers generate more helpful error messages; they were primarily an aid for the exception model, and certainly not a replacement for input validation.

Lukas Smith summarized the entire team's response in his own: 'This feature isn't PHP... or rather, it's a solution to a problem that does not exist in PHP... for good reason.' Richard Quadling rather ruined that end-of-thread moment by announcing that he'd find signature loading extremely useful and thought it a great feature.

Someone named Umberto Salsi wrote to inform everyone of the existence of PHPLint, 'a PHP parser and validator that performs a static analysis of the source, ensuring the safe handling of types'. It seems Hans was already familiar with the tool; he agreed that it is very useful when you want strict typing. However, he'd prefer to be able to easily overload functions, since this is one of the basic functions of an OO programming language. Still, if only he and Richard liked the idea, he'd drop his request. PHP users Daniel T. Gorski and Ken Stanley promptly wrote that no, they liked it too. Stas contented himself with a note that PHPLint seemed to him 'a better solution than messing with the language', and Marcus was impressed enough to suggest that Umberto showcase the tool at future conferences.

Short version: The ensuing 'my code is better than your code' exchange ran for another 20 posts or so, but I think you'll get the picture.

REQ: Lars

----- Original Message -----

From: "Lars Westermann" <lars.westermann@privat.dk>
To: <internals@lists.php.net>
Sent: Sunday, October 14, 2007 3:45 PM
Subject: [PHP-DEV] CVS Account Request: lwe

Maintaining an official, bundled PHP extension:
pdo_firebird and maybe php_interbase

Short version: Somebody give that man an account quick!

PAT: Large file support

Wez Furlong aroused comment when he posted a patch to fix an antique bug, #27792 (Functions fail on large files (filesize(), is_file(), is_dir()), by "promoting" file sizes that overflow LONG_MAX to double rather than long in various file manipulation and streams functions. He noted in his message that platforms other than Linux, Solaris, FreeBSD and OSX might need different CFLAGS to work correctly.

Debian's Sean Finney wondered why Wez didn't just use SIZE_MAX instead, and recommended using getconf LFS_CFLAGS to pick up the flags on 'other (weirder) platforms' - always assuming getconf exists there.

Security expert Stefan Esser looked over the patch and noted that compiling PHP with large file support breaks binary compatibility, since one of the globals has a differently-sized stat struct depending on whether LFS is enabled or not. Sean backed him, adding that this would be a particular problem for anyone using proprietary software built against the original API. RedHat's Joe Orton pointed out that the patch would also change the API of any library or application linked against PHP, causing PHP to see all instances of off_t as 64-bit; 'this breaks structure offsets etc very badly'. He recommended using 64-bit specific defines and types throughout to avoid the issue.

Stas simply asked Wez whether the issue wouldn't be better resolved at stream level rather than at function level. He also had qualms about the move to size_t, feeling that it might be 'rather dangerous' to change binary structures. Wez explained that his patch actually does resolve the issue at stream level; this hadn't been clear in his post, which had stressed that some extensions may need to adapt certain functions. The patch turns on LFS in the headers, which in turn promotes the off_t and size_t types used by streams to the 64-bit versions. The rest of the patch was all about adapting built-in PHP functions to make them capable of coping with numbers that are too big to fit into a long. Wez agreed with Stas about the changes to binary structures; that was exactly why he'd written the patch against the new development branch and not the PHP_5_2 stable branch.

Sean still had some concerns, but these proved to stem from his lack of familiarity with PHP internals (and/or to cast some light on Debian internals, depending on your way of viewing the world).

Stas finally got around to looking at the patch itself, and suggested that it might be better to have the defines in php-config.h (read: + .in) than to put them in CFLAGS. That way, anyone including PHP headers would at least be guaranteed to always get the same result on the same platform, preventing incompatible module builds. He wanted to know, though, if the defines would guarantee the same results across all libc versions?

Wez concurred over the need to move the defines; he'd assumed that CFLAG defines would trickle down to php-config.h, but apparently not so. However, he neglected to answer Stas' question about the effects of different C libraries.

Short version: It needs thorough testing before it has a hope.

TLK: A boost for Solaris?

First off, Rob Thompson wrote to internals@ to say that he'd been thinking about Solaris support for PHP and the problems of reproducing bugs under different Solaris versions. The University that employs Rob uses both PHP and Solaris heavily, and Rob had spent time trying to improve matters in consequence. He wanted to know whether there would be any interest from the PHP development team if he were to set up a series of Solaris systems running different versions and give them access to those systems for testing/debugging purposes.

Someone named Brian A. Seklecki recommended that Rob go to Pkgsrc for help with Solaris instead, but Rob made it clear that he actually wanted to help improve PHP if humanly possible. Nuno Lopes wrote that he'd be interested in Rob's setup if he could convince his University to support it; his own University has been discontinuing their Solaris servers, and he only has access to 'an ancient Solaris 7 server'. He'd be even more interested in seeing Rob's existing bugfix patches.

Second up, a Jean Jayet from Sun Microsystems wrote to say that his team are integrating PHP 5.2.4 into their new Open Solaris release. They had run the PHP test suite there and on a RedHat 4 system for comparison, and had - unsurprisingly - found more failed tests under Solaris than under RH. Jean wanted to know if there are test results grouped per OS anywhere on php.net; he hoped to find out the status of each failed test. Derick Rethans sent him to the QA mailing list, but noted that results aren't actually grouped there. Jean looked, but found it unhelpful; he couldn't easily find anything specific to Solaris and didn't have time to plough through all the posts. He also noticed that failed tests could vary, and wondered if there were some way to know which tests should always pass. Derick explained that this would include all failing tests, since those not applicable to the configuration are skipped.

Jean asked next what action he should take when faced with failing tests. Zoe Slattery pointed Jean to the bug reporting system and asked him simply to check that the bug report didn't already exist before posting the relevant .diff and .out files. Tony Dovgal mentioned that it is also possible to send a patch when there is obviously a missing workaround for Solaris-specific behaviour, or to report "wrong" tests to the php.qa mailing list. Nuno went one better and introduced Jean to the gcov page, pointing out that this was actually the 'reference' build Jean had hoped to find. Marcus wrote that it would be good to set up a database to collect information from trusted gcov machines; Nuno replied that this is on his TODO, but he has no time to do it. That said, Nuno asked whether Sun Microsystems would be interested in setting up a gcov machine to contribute to this when he does find time. Jean thought so, given that Sun would be interested in sharing their PHP 5.2.4 results on Open Solaris (SXDE). Jean also posted their test results and asked for help in analyzing the reasons for the existing test failures prior to posting any bug reports. This effectively ended the thread; there are no resources other than those already mentioned, and only Jean knows the SXDE platform.

Short version: A lot of 'maybe's in there...

TLK: Exception policy

Lukas Smith wanted clarification regarding the policy of throwing exceptions from the PHP core. As he recalled it, the original pre-PHP 5 decision had been that core exceptions should only be thrown on constructor errors, and extensions should allow users the option to explicitly enable an exception mode. This had come up now because he'd noticed that the new OO imagick extension is 'quite exception throwing happy'; although imagick is in PECL and not part of the core, Lukas felt that the question of policy should be addressed. Had he remembered correctly? Was this still the policy? Was there even a policy, and if not should there be one? Either way, there should be something about this in the coding standards file.

Derick believed the original policy should hold, but Tony saw nothing wrong in using exceptions throughout OO extensions. Moreover, he disliked the idea of having the error type controlled by a per-extension function, and didn't feel this should be part of any coding standard. Johannes Schlüter agreed that having the option to alter the error mode at runtime complicates code, and felt that the switch should be avoided in future. Lukas agreed that it could be problematic, but defended the decision to have such a switch in PDO, which was built this way from the ground up.

Larry Garfield promptly wrote that he'd had issues in the past with non-exception error mode handling in PDO, but Lukas pointed out that wrong error values indicate a bug rather than a feature. Larry explained that he hadn't been able to make a viable bug report from the code he'd been using. He'd found that the exception based checking made the code much nicer to deal with in any case, and saw the error mode - with its variable error array size - as flawed. Whatever. Larry's main thrust was that exceptions are popular precisely because they provide a clean and powerful mechanism; but he agreed that it should remain possible to use PHP without having try/catch blocks scattered through the script.

Short version: Inconclusive.

TLK: T_IMPORT vs T_USE [again]

Sebastian Nohn chased the Zend team for the results of their endeavours to support the use of reserved words in method names, function names and class names. Stas responded with the news that they had a simple patch to allow this for import, but that it was still undergoing testing. Andi Gutmans noted that applying the patch would mean breaking tokenizers and syntax highlighters; they'd need to actually parse the keyword to gain context. For this reason, he was beginning to think it might be best to stick with the rule that a keyword is always a keyword. In turn, this would mean dropping import and adopting use.

Pierre-Alain Joye pointed out that this would also mean keywords can't be added in minor PHP versions, and added - in case anyone had missed it - that having import as a keyword would break a lot of existing applications. He'd rather have Stas' patch or move to use, which returned fewer than ten Google codesearch results. Johannes mentioned that use has been a reserved keyword in PHP 5 since inception. Then again, without the patch the keyword namespace still might break existing code. Pierre hadn't realized that use was reserved. He wondered, in that case, what was holding up the switch? The namespace keyword was far less problematic than import, being used mainly where PHP 5 provides a native solution.

Stas thought it a good moment to mention that the patch in question only supports the import keyword, before anyone got carried away. Marcus wondered why import had even been considered, given that use had been reserved as a keyword long ago. And anyway, package was a better name than namespace... Stas pointed out that reserving package wouldn't resolve any of the problems under discussion; a lot of applications use it as an identifier, PEAR not least among them.

Short version: T_USE wins hands-down.

TLK: Object arithmetic [continued]

Stas picked up on Rob Richards' query from last week about inappropriate typing. He wrote that he wasn't sure how to fix this; there should be some type used, and many objects wouldn't provide a means of conversion to float. Stas thought it better to explicitly cast the type, since relying on the default conversion was 'probably not the best idea' in the context of arithmetic operations.

Alexey wondered why SimpleXML couldn't be forced to cast to string, leaving the Zend Engine to cast to float. Stas pointed out that the addition operator requires numeric values.

Rob, ignoring the diversion, wrote that he would have agreed with Stas if it weren't for __toString(), and threw him the following example of the current deficiency:

<?php

$xml
= '<root>1.12.2</root<';
$sxe = simplexml_load_string($xml);
print
$sxe->a ." + ".$sxe->b." = ".($sxe->a +$sxe->b);
// 1.1 + 2.2 = 3

?>

Besides, Rob didn't see why objects that happily provide a long converter shouldn't provide a float converter.

Short version: Just when you thought all __toString() related issues were resolved...

NEW: PHP 5.2.5 RC1

Ilia Alshanetsky, as Release Master for the PHP 5.2 series, announced the availability of PHP 5.2.5 RC1 for testing:

The first release candidate of 5.2.5 was just released for testing and
can be downloaded here:
http://downloads.php.net/ilia/php-5.2.5RC1.tar.bz2
(md5sum:f0c9ecbd50213958e9b69ec69f715ec)

The Windows binaries should become available shortly as well. Please
test this release against your code and let us know if you come
across any new problems or regressions. If all goes well, I
anticipate RC2 within 2 weeks, followed by the final release a week
later. I'd like to ask all developers to avoid making any changes
in.2 branch that are not bug fixes and at the same time review the
pending issues on bugs.php.net to see if any outstanding bugs can be
fixed by RC2.

Short version: Keeping the stable version stable is just as important as the bleeding-edge stuff. Please test.

TLK: Class resolution

Chuck Hagenbuch sparked a long and convoluted discussion when he complained that class resolution is wrongly ordered when autoloading namespaced code. Chuck had three files (attached to his mail). The demo.php file includes test.php directly and test_exception.php through its __autoload() function; both files declare the namespace Test. When Test::Tester->fail() is called from the demo file, the Tester class should throw a Test::Exception - in theory. In practice, demo.php resulted in a fatal error: Uncaught exception 'Exception' in test.php:7. Chuck reasoned that __autoload() was never called, and the class resolution order must be:

  • does the class exist in the current namespace?
  • if not, does the class exist in the global namespace?
  • if not, try autoload

He believed the autoload should be more finely tuned to avoid having to explicitly load any redefined global classes:

  • does the class exist in the current namespace?
  • if not, can the class be autoloaded with the current namespace?
  • if not, does the class exist in the global scope?
  • if not, try global autoload

Another way around the problem might be to force users to prefix all global classes with :: within a namespace; since back compatibility is already broken with namespaced code, retaining BC isn't really an issue.

Greg double checked; was Chuck suggesting namespace specific autoloading? If so, it sounded as if it could be overly complex. He could see another solution without the need for that:

<?php

namespace Test
;
import Test::Exception;

class
Tester {
    public function
fail() {
        throw new
Exception();
    }
}

?>

Greg recognized that this wasn't totally intuitive, but felt that an explicit declaration of any external classes at the top of the file would be a big bonus from the maintenance perspective.

Chuck explained that he'd been thinking in terms of A::B::C::Exception vs ::Exception; he hadn't thought about the possibility that A::B::Exception might also exist, and agreed with Greg that this would quickly get ugly. With that out of the way, he didn't have any better ideas than Greg's userland workaround for the time being, but observed that import combined with autoloading would affect all subsequent included files rather than just the immediate file. Perhaps all class names not prefixed with :: should refer to classes within the current namespace, with no fallback. Greg concurred that this would be liveable in his opinion, but didn't have time to back it up with a patch.

Stas caught up with the conversation at this point. He confirmed that Chuck's assumptions about the autoload ordering were essentially correct, but pointed out that the four-step autoload checks he'd envisaged would make it harder to use global classes (and therefore harder to convert older applications to use namespaces). Stas explained the rationale behind the ordering. It was assumed that the global Exception class was likely to be used more frequently than an extended Exception class; the fact that an extending class could be called Fred and still work had also come into it. The resolution rules were therefore designed to make it easy to work with the most frequent use case, and straightforward to migrate existing code. Stas saw the 'always prefix' idea as a non-starter, given migration considerations; 'the use of global classes is ubiquitous in current applications'. Basically, the problem wasn't that namespaced code wouldn't work with earlier PHP versions. The problem was that older code needed a simple upgrade path. Stas initially wrote that it would be better to use require_once where internal classes are overridden, but then saw Greg's solution and decided it was better. Far from being non-intuitive, it was immediately clear from the code exactly which exception class would be used.

Chuck wasn't at all sure about the idea that the standard global Exception class is the one most used in PHP, and cited the Zend Framework to illustrate his point. In his view, 'calling exception classes something other than Exception is kind of like giving up on the namespace'. That said, he agreed that Greg's explicit import solution looked like the best approach - when it came to dealing with global classes. The problem with using it for anything else was its propensity for propagation. It also didn't make a lot of sense to Chuck to have to import pieces of the current namespace. Stas didn't see Chuck's point about propagation, arguing that the name resolution for a file is defined by the code and import declarations within that file and 'import is just a way to write names shorter'. He also believed that the Exception class itself is a rare case, in that it's unusual to write code where the class names coincide with internal class names. All in all, Stas didn't see 'masking system classes with overloads and allowing only :: to work' as a good solution. Outside of anything else, the :: prefix would lead to an autoload call on each use of an internal class, overridden or not, which would significantly impact performance.

Chuck realized that Stas hadn't run the test code he'd posted and started again from the top, posting some more code to prove his point. The implementation as it stood didn't work the way Stas had said it did; the next file to be loaded would inherit an imported Exception override, without explicitly importing it itself. Chuck added a heartfelt plea for full control over namespaced elements, pointing out that he currently has the choice of listing every class used at the head of each file or being at the mercy of any new global class that comes along. Although he understood the argument that migration to namespace usage should be simple, he felt that there should be some encouragement to think about the implications. Everything in a namespaced file should be relative to that namespace, and any global element explicitly imported, in exactly the same way as variable scope works in PHP. Chuck almost ruined this moment of pure poetry by adding that he 'didn't have a grand theory of why variable scope and class scope should match'. He just thought most people would want namespaces to be as self-contained as possible, and that would mean explicitly importing external, rather than internal, elements into a namespace.

Chuck evidently felt rather passionate about this, and a number of emails went back and forth between himself and Stas. DateTime was of course mentioned in passing, since Stas insisted that clashes with internal class names weren't and wouldn't be an issue. Chuck went as far as threatening to write a patch himself ('scary as it might be for others') to force external dependencies to be declared at the head of a namespaced file. He didn't get that far, though; he'd tried Stas' suggestion of explicitly importing classes in the same namespace, and reported that this doesn't work. Surely the whole point was that PHP should be able to add new classes, and third-party code should be loaded, without affecting namespaced code?

Short version: Something's not right.

CVS: Freeze

Changes in CVS prior to 5.2.5RC1 that you should probably be aware of include:

  • Zend Engine bug #42859 (import always conflicts with internal classes) was fixed in 5_3 and HEAD [Greg Beaver, Dmitry]
  • Core bugs #42919 (Unserializing of namespaced class object fails) and #42722 (display_errors setting ignored for E_PARSE and HTTP 500 page) were fixed, the latter in 5_2 also [Dmitry]
  • Core bug #43020 (Warning message is missing with shuffle() and more than one argument) was fixed in the 5_3 and 5_2 branches [Scott MacVicar]
  • In the PHP_4_4 branch only, GD bug #43010 (regression in imagearc with two equivalent angles) was fixed [Pierre]

Depending on your choice of RDBMS, the big CVS news of the week was either the tweaking of the MySQL native driver in the PHP_5_3 branch to allow it to build under PHP 5.2.*, or the fact that the Firebird extensions finally have a dedicated maintainer in Lars Westermann. PHP's only Firebird user, Lester Caine, should be happier now.

Marcus had a funny half-hour when a CVS checkout fell apart on him. Several SPL commits to 'fix brnach (funny broken checkout)' ensued. Anyone thinking of searching the PHP bugs database for "brnach" as a result - take it from one who knows, and don't.

The only other thing of note was that Lukas now has php-src karma. Who knows, maybe he'll be the PHP 5.4 series RM after all...

Short version: Someone gave that man an account quick.

PAT: OnUpdateUTF8String

Christopher Jones of Oracle fame posted a patch create by one of their engineers to allow OnUpdateUTF8String() to work as he thought it should in PHP 6. He asked for feedback about the idea, but none was immediately forthcoming.

And finally, Dmitry applied a patch offered a couple of weeks ago by Bill Moran to the SOAP extension in PHP_5_2, PHP_5_3 and CVS HEAD. The patch fixes bug #42637 (SoapFault: Only http and https are allowed).

Short version: A bit of a wait, but we got there in the end.

Comments