Categories


Loading feed
Loading feed
Loading feed

Zend Weekly Summaries Issue #348


REQ: get_object_id()
TLK: Installer extensions
NEW: Free Windows builds
TLK: Simple namespace proposal
TLK: ZTS and Apache prefork
TLK: unicode.semantics [continued]
RFC: RIP PHP 4?
RIP: In Memoriam
CVS: User streams
PAT: SNMP and escapeshellarg

1st July - 7th July 2007

REQ: get_object_id()

Pavel Shevaev followed up last week's brief thread about __toString() and the object ID with a question. Why couldn't there be a core PHP function, get_object_id(), that does much the same thing as spl_object_hash()? Gwynne Raskind was all for making this a function alias, but Marcus Börger put his foot down. Sebastian Bergmann wrote that, as far as he's concerned, SPL is in the core... Pavel sincerely hoped it would be, by the time PHP 6 comes out. Still, in his opinion get_object_id() was easier to understand, and having it as standard would make it easier to find in the documentation. Marcus pointed out that, technically speaking, 'object id is neither what you get nor what you want'; it's not unique and therefore not particularly helpful, and he had no intention of providing a function to access it. Besides, spl_object_hash() is OO specific, so the SPL extension is the perfect place for it. Pavel felt that he probably hadn't put his case very well, and tried again. What he wanted was a function that would return the first element of var_dump() when applied to the same object; its unique ID prefixed by the class name, something like object(Foo)#1.

Lars Schultz, the originator of the thread last week, wrote that he'd wanted that too. The old approach, with the automatic cast to string, had been immensely useful for debugging. Why couldn't that behaviour be the default when a __toString() method is not implemented? Sebastian explained that there is no unique object ID in the Zend Engine. Lars wondered, in that case, why spl_object_hash() couldn't be invoked instead?

Pavel thought that even if the string only contained a counter for each object type it was still useful, but someone named David explained that the "counter" isn't really a counter. Each "Object ID" is basically an index to an element of an internal array that holds data for all the objects currently in memory. Those handles are reused when an object is destroyed; it would be difficult to ensure that the results of an object ID comparison were meaningful. spl_object_hash() comparisons, on the other hand, would always be correct. Stas Malyshev added to this; PHP extensions can have their own handlers and IDs, and they don't even need to adhere to the use of classname and ID to uniquely identify an object.

Evidently not one for giving up easily, Pavel asked again for spl_object_hash() to be renamed or aliased to object_get_id(), since the name doesn't very well describe its functionality. Stas pointed out that spl_object_hash() does actually return the unique object hash, and wondered what form the object ID should take, given that there is no PHP type equivalent to "tuple of the C pointer and long integer". Sebastian suggested incrementing a counter at the creation of every new object and associating the number with the object. Stas agreed that this would solve the problem - if there is one - but pointed out that implementing it would mean adding a new value into the internal object structure and modifying every object-oriented extension in PHP. He wasn't sure it was that important to have a running numeric ID.

David figured it was simply a matter of documentation; at present it's not obvious that spl_object_hash() simply hashes the handler table and handle tuple. Tony Dovgal wrote bluntly that documentation patches are always welcome. Meanwhile David Zülke had read part of the thread, and bounced in to explain to Pavel that two objects with the same data will give the same hash:

var_dump(spl_object_hash(new stdClass()), spl_object_hash(new stdClass()));

Lars explained that this has nothing to do with object data; the same memory space is used for both objects because the first is cleared before the second is instantiated. Instantiating both classes prior to calling

var_dump(spl_object_hash($x), spl_object_hash($y));

gives different results. Pavel grumbled that a new object should always have a unique ID. Stas agreed that reusing IDs might not be such a good idea, but pointed out that not reusing them wouldn't necessarily be useful either due to the limitations of long.

Short version: No, really. Use spl_object_hash().

TLK: Installer extensions

Scott MacVicar wrote to internals@ to say that the Windows Installer includes a whopping 118 extensions with its distribution. To his knowledge, three of those extensions crash either on startup or on shutdown. He therefore proposed disabling every extension marked EXPERIMENTAL or without a PECL release. Tony went further; he believed the installer should only include extensions that are shipped alongside the PHP core in the standard Windows distribution. It's easy enough these days to find extension DLLs on pecl4win if necessary, after all.

Installer author John Mertic explained that he simply repackages the distribution zip files, php-xxx-win32.zip and pecl-xxx-win32.zip. He thought the issue of suitability should be addressed with at the PECL level, and recommended signalling the package status in much the same way PEAR packages do. Tony explained in turn that the PHP distribution is only the former of these zip files. The second zip is simply a bundle of PECL extensions put together at the time of the PHP release for the sake of user convenience. Later, Tony added that PECL packages are in fact marked in exactly the same way as PEAR packages already.

Scott was hopeful of grouping the extensions into "core" and "added functionality", but John wrote that this had been considered too confusing in the past. Tony wanted to go further and remove PECL extensions from the installer completely, pointing out that most of the people who would use an installer in the first place have probably never even heard of PECL.

Short version: More PECL infrastructure woes.

NEW: Free Windows builds

Zoe Slattery of IBM announced that she had finally finished writing the promised document giving step-by-step instructions for building PHP 5 or 6 on a Windows system using the free tools available from Microsoft. Zoe has made the instructions available on her blog, and invited anyone with suggestions for improvement to comment there.

Guillaume Rossolini has already translated the document into French and made it available on his site, developpez.com.

Short version: Just don't try downloading those free tools over dial-up.

TLK: Simple namespace proposal

Out of nowhere (at least as far as anyone outside St. Petersburg knew), Dmitry Stogov posted a patch offering namespace support for PHP 6 and asked for review. He outlined the concept in his post to internals@. The main point was that the aim of namespaces in PHP should be simply to make class names manageable; anything beyond this was outside the remit of namespace support. A namespace would be declared at the head of a file using the keyword namespace, for example:

<?php

namespace Zend
::DB;

class
Connection {}
function
connect() {}

?>

All class and function names within that file would automatically be prefixed with the namespace name. It is possible to use the same namespace across several files. The local name always takes precedence over a global name, but the full name - for example, Zend::DB::connect() - can be used anywhere. An import statement works for both namespaces and class names:

<?php

require 'Zend/Db/Connection.php';
import Zend::DB;
import Zend::DB::Connection as DbConnection;

$x = new Zend::DB::Connection();
$y = new DB::connection();
$z = new DbConnection();
DB::connect();

?>

but is only used to define name aliasing. import can only be used in global scope, and takes effect from the point of the definition to the end of the file in which it is written.

Any class or function name beginning with :: (the "empty" namespace) is interpreted as global. There is a built-in constant, __NAMESPACE__, that contains the current namespace and can be used to construct fully qualified names:

<?php

namespace A
::B::C;

function
foo() {}

set_error_handler(__NAMESPACE__ . "::foo");

?>

In the global namespace, __NAMESPACE__ will have the value of an empty string.

Dmitry went on to explain the rules for name resolution within a namespace, but everyone was busy looking at the code by this time.

Phorum developer Brian Moon was the first to ask. Did Dmitry mean he could type the same namespace declaration at the head of every file in an application and all the code in those files would exist in the same namespace? Dmitry wrote 'Exactly', but Stas had some qualms; the patch currently doesn't resolve names. Brian might need to use the __NAMESPACE__ constant to achieve this end. Brian wondered about namespaced variables, but Dmitry explained that variables and constants are unaffected because both are defined at run-time. Stas pointed out helpfully that arrays and classes are there for grouping variables, but Stefan Walk was still unhappy about those constants, and asked if it would be possible to add compile-time ones. Dmitry explained gently that PHP itself doesn't have those.

Stefan Priebsch had read through the less exciting stuff, and thought that using :: as a namespace separator was probably causing more problems than there need to be when it comes to resolving qualified functions. He also wasn't sure what would happen about include files, but Stas reiterated the point that namespaces are per-file; include files can't "inherit" them. The same applies to the import statement. Stefan did, however, discover one interesting thing. It's possible to "override" internal PHP functions. Stas didn't see this as a problem, writing that 'tools can deal with it'. Stefan thought it would make an interesting feature if it weren't for the fact that it's possible to "override" by accident. In his view, it should be forbidden because of this. David Coallier agreed; an E_NOTICE or E_WARNING, or maybe even an exception, would be useful there. Stefan was very much in two minds about losing that feature. Maybe there should be another keyword to make clear that the overloading was deliberate... but he realized that 'this may be a little over the top'. Dmitry tried it out:

<?php

namespace UTF8
;

overloaded class Exception {}
overloaded function strlen() {}

?>

He wasn't sure there needed to be any special keyword, but promised to think over the relationship between internal and PHP global functions. Someone named Rich Buggy suggested scrapping the overload keyword and using an INI setting to control error logging. Brian Moon wondered nervously whether code like this would override any function in the global scope or just built-in ones, but Dmitry reassured him that names within a namespace would never conflict with userland names in the global namespace.

Andrei Zmievski wrote a simple note to Dmitry: 'I love this. Let's ship it.'

David Coallier was less certain; he wanted to be able to use curly brackets to encapsulate a namespaced group. He was far from being alone in this. Stefan Priebsch, while admiring the cleanliness of the namespace-per-file concept, came up with a use case for multiple namespace declarations in a single file. He explained that the single class per file with conditional loading found in most PHP OO applications 'doesn't play well with caching'. He's working on gluing all project files together into a large script that can be cached as a single application binary, and the restriction to a single namespace per file would kill this admittedly experimental approach.

Sebastian Bergmann wondered why Stefan didn't simply expand his namespaces during the "gluing" stage, leaving the final file with either a single namespace or none. Stefan pointed out that this would mean parsing and writing every file in the application; he also wasn't sure that it would work out when it came to dynamically creating class and function names. Perhaps there needed to be a PHP function that would return a fully qualified name for a given name; giving the Engine control over the expansion would ensure that the correct rules were used. Sebastian agreed; he believed this functionality should be added to the Reflection API.

Short version: Somebody did some thinking outside the box. (Kudos to Dmitry!)

TLK: ZTS and Apache prefork

Internals newbie Oliver Block was confused. He'd been advised to compile PHP --with-maintainer-zts while working on the source. He wanted to know whether PHP compiled in this way as an Apache module was supposed to work without problems only on Apache worker, or whether it should work on Apache prefork too?

Stas believed it might work, but didn't see why anyone would want to do that because the ZTS build is so much slower. Scott explained; he'd advised Oliver to compile it this way so that he'd remember to add thread safety management code where necessary. Stas agreed this was a good move, but made it clear that he wouldn't recommend running a ZTS PHP module with the Apache prefork MPM for anything other than core/extension testing and development. Apache developer William A. Rowe pointed out that ZTS gives the option of loading the PHP module under prefork, worker or even the event MPM, and this leads to ZTS being the build of choice for many OS bundles. He added, however, that Stas was right in saying that it loses something in performance with the prefork MPM. Derick Rethans regarded that performance hit as excessive, and remarked that if anyone had found a distribution offering a ZTS version of PHP with Apache prefork they should simply compile their copy of PHP from source.

Short version: Don't use this combination for anything other than testing new extension development.

TLK: unicode.semantics [continued]

Richard Lynch had finally realized what it was that Tomas Kuliavas had been trying to say throughout this (long, weary) thread. 'Gak', he wrote expressively. Did Tomas mean that code like:

$mask = 0xf0;
$value = $_POST['foo'] & $mask;

would break under Unicode mode? 'That can't be right...' Surely if nothing new-fangled had been done to turn strings into Unicode they'd just be ASCII, or assumed to be ASCII by PHP? Richard wasn't surprised that using new features from a new major release version would cause parser errors in older PHP versions - but an old script ought to "just work".

Matt Wilmas wrote that this particular piece of code shouldn't break anyway, given that both values are integers. In fact, he had similar issues with a number of Tomas' complaints. The point of this entire thread though was that when unicode.semantics is switched on PHP 6 strings are Unicode strings, without anyone having changed a thing, unless they are explicitly cast to binary. This obviously does lead to different behaviour, and this was the part Matt thought wrong; he'd prefer the onus to be on the programmer to explicitly cast strings to Unicode. This way Unicode support could be always available and completely back compatible too; only code written specifically to be Unicode-aware would be affected.

Stas also replied to Richard. He explained that the problem with having some automatic downgrade to ASCII is that UTF-16, which is used internally, is not compatible with ASCII in the way that UTF-8 is. It wouldn't work in all situations.

Johannes Schlüter, getting back on topic, wrote that it's just as simple to install two different versions of PHP as to install two versions of PHP 6, one with and one without Unicode support. The difference was that in PHP 6, you'd be faced with two incompatible products with the same name, causing far more difficulties both for hosts and for application developers. Cristian Rodriguez agreed vehemently; if unicode.semantics can't be set at runtime it's bound to be switched off in most installations because it will break too much existing code otherwise. He couldn't see a happy ending here, and muttered something about this being worse than magic_quotes and safe_mode. Derick agreed with him, on the grounds that there's no way to work around unicode.semantics=on in userland code. Jani Taskinen explained that this is exactly why he started this thread in the first place; he believes the setting causes more problems than it solves. Rasmus Lerdorf wrote that the real question was whether the team wanted a true Unicode mode for PHP or not. He went on to say that Matt's suggestion regarding explicit control over Unicode support is exactly what is there already when unicode.semantics=off. This is the mode that ISPs are expected to run their shared servers in, and is also the mode that portable PHP scripts should be written for. However, should having this mean there's no attempt to go all the way? Where would that leave PHP, five years down the line? Rasmus ended by noting that the idea was to minimize the issues 'enough that the amount of code that... needs to be written twice will be limited'; if this doesn't work out, the full Unicode attempt will have failed. At this stage, though, he wasn't prepared to give up trying.

To Lukas Smith, the bigger question was how best to maintain the "fork" - in two separate branches (i.e. PHP 5 and PHP 6), or in a single one? He still believed that the first of those options was the better choice. Tony agreed, given that Rasmus had more or less said the intention is to offer full Unicode support to a very limited set of users. 'You don't buy a Porsche if you need a taxi', so why not simply backport the other new features to PHP 5 and offer both branches, rather than have the switch in one?

Richard Quadling didn't see why Unicode support couldn't have been offered as an extension in the first place. Tony mentioned ext/mbstring, but Stas was quick to point out that mbstring offers nothing like the depth of Unicode support offered by the ICU API. More helpfully, Derick explained that the Zend Engine needs to be able to work with the Unicode library, and this isn't something that can be done from an extension. Richard wondered, then, what the team anticipated for the future of PHP 4, given that PHP 5 hasn't really had a widespread uptake and PHP 6 offers functionality needed by only a subset of users. Stefan Priebsch felt that the features backported to PHP 4 had been instrumental in holding back PHP 5 uptake; he really didn't want to see the same thing happen again. Lukas queried this; he didn't recall anything being backported to PHP 4, apart from the memory corruption fix that forced PHP 4.4.0. Stefan made his point anyway, that PHP 6 needs more selling points than Unicode support. Pierre waved namespaces at him, but agreed that he'd rather not backport namespace support to PHP 5. Stefan wondered how namespace support was getting along, and whether it had ever been intended for PHP 6.0 anyway. Johannes patiently gave him the link to the minutes of the PHP 6 planning meeting held nearly two years ago and suggested he might read the other thread(s) running concurrently on the internals list. Tony, though, didn't care about keeping namespace support exclusively in PHP 6, assuming that the PHP "fork" is to be split into two separate branches.

Short version: A split over a split.

RFC: RIP PHP 4?

Derick, having been taken to task over his April 1st announcement about the demise of PHP 4 at the end of 2007, wanted to gauge whether there was any consensus for making that announcement for real. He added hastily that security issues would still be fixed due to the size of the installed base, but this should be the only thing that warrants a new PHP_4_4 release.

The next nine posts voted for this scenario, but Vesselin Kenashkov broke the rhythm with the tenth. He thought PHP 4 support should continue until PHP 6 is released. However, the next four posters didn't make that distinction between "when PHP 6 is released" and "the end of the year", leaving the score at 14 to 1.

Rasmus Lerdorf broke the flow to point out that "security fixes only" has actually been the extent of support offered for PHP 4 for the last year or so, and demanded to know what Derick actually intended? To Rasmus, "dropping support" meant no new releases at all, for any reason, and he didn't feel the time was yet ripe for this. Tony responded that "dropping support" to him would mean closing all PHP 4-only bug reports and advising those users troubled by them to upgrade to PHP 5. He'd only have releases when there were critical security issues, and even those should stop some time over the next couple of years. Stas argued that the security fixes aren't the only problem; the user base is such that important non-security fixes should go in too. This being pretty much the level of PHP 4 support offered currently, he joined Rasmus in his reluctance to regard it as "dropping support". Rasmus, though, was mostly concerned over the possibility of giving mixed messages to PHP users. He wrote that he'd rather put out a statement now with 'a final death date on it' and (apparently randomly) suggested 08/08/08 for that date.

Meanwhile, a further twelve posts voting to drop support for PHP 4 trickled in.

Jani Taskinen thought that simply having an official notice on the php.net homepage would be enough to let hosting companies and users alike realize that the end is nigh. He'd rather drop PHP 4 support entirely by the end of 2007 and focus fully on PHP 5 and 6. Stas pointed out that the team already are focused fully on PHP 5 and 6; there hasn't been a discussion about anything PHP 4 related on the internals list in some time, security fixes apart. He wanted an official phasing-out plan, and expected a full year to be enough time for everybody to make their move. Derick was by now coming to this conclusion too; he wanted there to be a clear statement that bug fixes will end after 2007 and security fixes after (you guessed it) 08/08/08. Marco (not sure which Marco) suggested that PHP 4 download links should be moved from their prominent position on php.net to the PHP museum at the same time. There was some support for this.

Andi Gutmans, much to Derick's surprise and delight, agreed with all of it, and recommended that the proposed changes to the homepage should be put in place immediately. Jani still held out for dropping PHP 4 support by the end of the year, pointing out that it would still be available for download - just not supported. Andi vetoed that, arguing that PHP 4 users would need time to plan their migration. A year's notice seemed long enough to him.

Rasmus finally shared the roots of his 08/08/08 fixation with us all. It seems the Chinese word for 8 sounds like 发, which means "prosper" or "wealth"; the date is considered lucky in China.

Short version: The writing is on the wall. (And on the php.net homepage.)

RIP: In Memoriam

So. Farewell then
PHP 4.
Key to the Web
Empowerer of the proletariat
Glue of the people.

KISS!

That was your
Catchphrase.
It won you
Many friends.

Keith's Mum says
She always liked you the best
but me and Keith reckon
It's time to go live
With 5.

Short version: With apologies to E.J. Thribb (17½).

CVS: User streams

Changes in CVS that you should probably be aware of include:

  • Core bug #41865 was fixed when the second parameter in fputcsv() became optional no more [Mehdi Achour, Jani]
  • In ext/simplexml, bug #41867 (getName() is broken) was fixed [Rob Richards]
  • Also in ext/simplexml, bug #41861 (getNamespaces() returns namespaces of node's siblings) was fixed [Rob]
  • The pgsql extension now compiles against PostgreSQL versions older than 7.4 (5_2 only) [Ilia]
  • In ext/openssl, bug #41770 (SSL: fatal protocol error due to buffer issues) was fixed in the PHP_5_2 branch only [Ilia]
  • Basic PDO->quote() functionality was added to PDO_OCI [Christopher Jones]
  • The bundled timezone database in ext/date was updated to 2007.6 (2007f) [Derick]
  • Unicode XML should be working now in ext/simplexml in CVS HEAD [Dmitry]
  • Also in CVS HEAD, ext/pcre gained Unicode/binary support [Dmitry]

In other CVS news, Dmitry added the ability in CVS HEAD to create local or remote user streams. As discussed a few weeks ago, local user streams are not allowed to open URLs if the allow_url_include INI directive is switched off. As part of his patch - which was only very loosely based on the most recent of Stas' offerings - Dmitry introduced a new userland function, stream_is_local(), making it possible for application developers to check the status of a given stream. He also extended stream_wrapper_register() to accept an additional optional argument, flags. For the present only one flag, STREAM_IS_URL, has been implemented; it registers the user stream wrapper as remote.

With this much out of the way, Dmitry wrote to Ilia requesting permission to backport the patch to the PHP_5_2 branch. He noted that 'it looks like it breaks binary compatibility but really it doesn't', which of course guaranteed that everyone following the internals list would promptly go through that code with a magnifying glass. Ilia immediately wanted to know how Dmitry expected to change the PHP core globals struct without breaking BC. Dmitry explained that the size of the structure is only important during allocation in this case. Pierre was suspicious of it too, and wanted to know if it would cause runtime warnings from his Ubuntu system, which is apparently quite sensitive to mixed library versions. Rasmus pointed out that binary compatibility is both backwards and forwards, and would be broken in this case if any extension tried to access the new core globals element. Dmitry couldn't think of a single reason why an extension might want to do that, but backed down anyway and wrote that the patch could wait for PHP 5.3 - just as Rasmus backed down over the assertion that no extension would use it.

Sean Finney wondered whether the struct size change might cause problems for third party package maintainers, given that some PHP extensions aren't compiled against the local copy. Dmitry thought not, given that the struct is allocated and initialized in the PHP core. Still, he wrote that he would hold back the patch for now, and revert it if there were any problems reported prior to the PHP 5.2.4 release. Ilia seemed happy with this, and agreed to it.

Short version: The security model for user streams will be backported to the 5_2 branch.

PAT: SNMP and escapeshellarg

Sara Golemon posted an ext/simplexml patch for review. She wrote that an empty() check on a node with children currently returns FALSE because SimpleXML doesn't apply PHP's 'emptiness rules' to their content. David Coallier promptly tested the patch, and pronounced that it 'makes perfect sense'. Nobody else replied on-list; Sara subsequently committed her fix.

Tony queried SNMP patch author Gustaf Gunnarsson over the timing of his calls to snmp_free_pdu(), sensing a double free in the making (read: crash). Gustaf explained that there was a clone involved in the process, and referred Tony to the SNMP client and API sources.

One Tzachi Tager posted a potential solution for bug #40928 (escapeshellarg() does not quote percent (%) correctly for cmd.exe). He thought the problem was that escapeshellarg() and escapeshellcmd() under Windows don't utilize command line escaping, which does actually exist in the forms of ^ or ". The current code simply replaces special characters with spaces. He attached a patch to fix the problem. The reporter of the bug, Frode Moe, suggested that there should probably be a completely different set of escape functions for the Windows platform to avoid the risk of changing working functions elsewhere. He also commented that Tzachi's patch was 'difficult to read'. Tzachi took the latter point and followed up with a more readable version, but nobody else responded.

And finally, Tony applied a patch to fix Gentoo configuration bug #41908 (CFLAGS="-Os" ./configure --enable-debug fails). The patch was posted by the bug reporter.

Short version: Tzachi's patch needs a rethink, Gustaf's is in PAT pending tests.

Comments


Wednesday, July 25, 2007
RE: TLK: ZTS AND APACHE PREFORK
4:31PM PDT · sniper
Thursday, July 26, 2007
OH, THAT...
3:08AM PDT · Steph Fox (staff)
Loading feed