Zend Weekly Summaries Issue #348

REQ: get_object_id()
TLK: Installer extensions
NEW: Free Windows builds
TLK: Simple namespace proposal
TLK: ZTS and Apache prefork
TLK: unicode.semantics [continued]
RFC: RIP PHP 4?
RIP: In Memoriam
CVS: User streams
PAT: SNMP and escapeshellarg

1st July – 7th July 2007

REQ: get_object_id()

Pavel Shevaev followed up last week’s brief
thread
about __toString() and the object ID with a question.
Why couldn’t there be a core PHP function, get_object_id(), that
does much the same thing as spl_object_hash()? Gwynne Raskind
was all for making this a function alias, but Marcus Börger put his foot
down. Sebastian Bergmann wrote that, as far as he’s concerned, SPL is
in the core… Pavel sincerely hoped it would be, by the time PHP 6 comes
out. Still, in his opinion get_object_id() was easier to
understand, and having it as standard would make it easier to find in the
documentation. Marcus pointed out that, technically speaking, ‘object id
is neither what you get nor what you want
‘; it’s not unique and therefore
not particularly helpful, and he had no intention of providing a function to
access it. Besides, spl_object_hash() is OO specific, so the SPL
extension is the perfect place for it. Pavel felt that he probably hadn’t put
his case very well, and tried again. What he wanted was a function that would
return the first element of var_dump() when applied to the same
object; its unique ID prefixed by the class name, something like
object(Foo)#1.

Lars Schultz, the originator of the thread last week, wrote that he’d wanted
that too. The old approach, with the automatic cast to string, had been
immensely useful for debugging. Why couldn’t that behaviour be the default
when a __toString() method is not implemented? Sebastian
explained that there is no unique object ID in the Zend Engine. Lars
wondered, in that case, why spl_object_hash() couldn’t be
invoked instead?

Pavel thought that even if the string only contained a counter for each
object type it was still useful, but someone named David explained that the
“counter” isn’t really a counter. Each “Object ID” is basically an index to
an element of an internal array that holds data for all the objects currently
in memory. Those handles are reused when an object is destroyed; it would be
difficult to ensure that the results of an object ID comparison were
meaningful. spl_object_hash() comparisons, on the other hand,
would always be correct. Stas Malyshev added to this; PHP extensions can have
their own handlers and IDs, and they don’t even need to adhere to the use of
classname and ID to uniquely identify an object.

Evidently not one for giving up easily, Pavel asked again for
spl_object_hash() to be renamed or aliased to
object_get_id(), since the name doesn’t very well describe its
functionality. Stas pointed out that spl_object_hash() does
actually return the unique object hash, and wondered what form the object ID
should take, given that there is no PHP type equivalent to “tuple of the C
pointer and long integer”. Sebastian suggested incrementing a counter at the
creation of every new object and associating the number with the object. Stas
agreed that this would solve the problem – if there is one – but pointed out
that implementing it would mean adding a new value into the internal object
structure and modifying every object-oriented extension in PHP. He wasn’t
sure it was that important to have a running numeric ID.

David figured it was simply a matter of documentation; at present it’s not
obvious that spl_object_hash() simply hashes the handler table
and handle tuple. Tony Dovgal wrote bluntly that documentation patches are
always welcome. Meanwhile David Zülke had read part of the thread, and
bounced in to explain to Pavel that two objects with the same data will give
the same hash:


var_dump(spl_object_hash(new stdClass()),
spl_object_hash(new stdClass()));


Lars explained that this has nothing to do with object data; the same memory
space is used for both objects because the first is cleared before the second
is instantiated. Instantiating both classes prior to calling


var_dump(spl_object_hash($x), spl_object_hash($y));


gives different results. Pavel grumbled that a new object should always have
a unique ID. Stas agreed that reusing IDs might not be such a good idea, but
pointed out that not reusing them wouldn’t necessarily be useful
either due to the limitations of long.

Short version: No, really. Use spl_object_hash().

TLK: Installer extensions

Scott MacVicar wrote to internals@ to say that the Windows Installer includes
a whopping 118 extensions with its distribution. To his knowledge, three of
those extensions crash either on startup or on shutdown. He therefore
proposed disabling every extension marked EXPERIMENTAL or without a PECL
release. Tony went further; he believed the installer should only include
extensions that are shipped alongside the PHP core in the standard Windows
distribution. It’s easy enough these days to find extension DLLs on pecl4win
if necessary, after all.

Installer author John Mertic explained that he simply repackages the
distribution zip files, php-xxx-win32.zip and
pecl-xxx-win32.zip. He thought the issue of suitability should be
addressed with at the PECL level, and recommended signalling the package
status in much the same way PEAR packages do. Tony explained in turn that the
PHP distribution is only the former of these zip files. The second zip is
simply a bundle of PECL extensions put together at the time of the PHP
release for the sake of user convenience. Later, Tony added that PECL
packages are in fact marked in exactly the same way as PEAR packages already.

Scott was hopeful of grouping the extensions into “core” and “added
functionality”, but John wrote that this had been considered too confusing in
the past. Tony wanted to go further and remove PECL extensions from the
installer completely, pointing out that most of the people who would use an
installer in the first place have probably never even heard of PECL.

Short version: More PECL infrastructure woes.

NEW: Free Windows builds

Zoe Slattery of IBM announced that she had finally finished writing the
promised document giving step-by-step instructions for building PHP 5 or 6 on
a Windows system using the free tools available from Microsoft. Zoe has made
the instructions available on her blog,
and invited anyone with suggestions for improvement to comment there.

Guillaume Rossolini has already translated the document into French and made
it available on his site, developpez.com.

Short version: Just don’t try downloading those free tools over dial-up.

TLK: Simple namespace proposal

Out of nowhere (at least as far as anyone outside St. Petersburg knew),
Dmitry Stogov posted a patch offering
namespace support
for PHP 6 and asked for review. He outlined the concept
in his post to
internals@
. The main point was that the aim of namespaces in PHP should
be simply to make class names manageable; anything beyond this was outside
the remit of namespace support. A namespace would be declared at the head of
a file using the keyword namespace, for example:


<?php

namespace Zend::DB;

class Connection {}
function
connect() {}

?>


All class and function names within that file would automatically be prefixed
with the namespace name. It is possible to use the same namespace across
several files. The local name always takes precedence over a global name, but
the full name – for example, Zend::DB::connect() – can be used
anywhere. An import statement works for both namespaces and class names:


<?php

require 'Zend/Db/Connection.php';
import Zend::DB;
import Zend::DB::Connection
as DbConnection;

$x = new Zend::DB::Connection();
$y = new
DB::connection();
$z
= new DbConnection();
DB::connect();

?>


but is only used to define name aliasing. import can only be
used in global scope, and takes effect from the point of the definition to
the end of the file in which it is written.

Any class or function name beginning with :: (the “empty”
namespace) is interpreted as global. There is a built-in constant,
__NAMESPACE__, that contains the current namespace and can be
used to construct fully qualified names:


<?php

namespace A::B::C;

function foo() {}

set_error_handler(__NAMESPACE__
. "::foo");


?>


In the global namespace, __NAMESPACE__ will have the value of an
empty string.

Dmitry went on to explain the rules for name resolution within a namespace,
but everyone was busy looking at the code by this time.

Phorum developer Brian Moon was the first to ask. Did Dmitry mean he could
type the same namespace declaration at the head of every file in an
application and all the code in those files would exist in the same
namespace? Dmitry wrote ‘Exactly‘, but Stas had some qualms; the patch
currently doesn’t resolve names. Brian might need to use the
__NAMESPACE__ constant to achieve this end. Brian wondered about
namespaced variables, but Dmitry explained that variables and constants are
unaffected because both are defined at run-time. Stas pointed out helpfully
that arrays and classes are there for grouping variables, but Stefan Walk was
still unhappy about those constants, and asked if it would be possible to add
compile-time ones. Dmitry explained gently that PHP itself doesn’t have
those.

Stefan Priebsch had read through the less exciting stuff, and thought that
using :: as a namespace separator was probably causing more
problems than there need to be when it comes to resolving qualified
functions. He also wasn’t sure what would happen about include files, but
Stas reiterated the point that namespaces are per-file; include files can’t
“inherit” them. The same applies to the import statement. Stefan did,
however, discover one interesting thing. It’s possible to “override” internal
PHP functions. Stas didn’t see this as a problem, writing that ‘tools can
deal with it
‘. Stefan thought it would make an interesting feature
if it weren’t for the fact that it’s possible to “override” by accident. In
his view, it should be forbidden because of this. David Coallier agreed; an
E_NOTICE or E_WARNING, or maybe even an exception,
would be useful there. Stefan was very much in two minds about losing that
feature. Maybe there should be another keyword to make clear that the
overloading was deliberate… but he realized that ‘this may be a little
over the top
‘. Dmitry tried it out:


<?php

namespace UTF8;

overloaded class Exception {}
overloaded function strlen() {}

?>


He wasn’t sure there needed to be any special keyword, but promised to think
over the relationship between internal and PHP global functions. Someone
named Rich Buggy suggested scrapping the overload keyword and
using an INI setting to control error logging. Brian Moon wondered nervously
whether code like this would override any function in the global scope or
just built-in ones, but Dmitry reassured him that names within a namespace
would never conflict with userland names in the global namespace.

Andrei Zmievski wrote a simple note to Dmitry: ‘I love this. Let’s ship it.

David Coallier was less certain; he wanted to be able to use curly brackets
to encapsulate a namespaced group. He was far from being alone in this.
Stefan Priebsch, while admiring the cleanliness of the namespace-per-file
concept, came up with a use case for multiple namespace declarations in a
single file. He explained that the single class per file with conditional
loading found in most PHP OO applications ‘doesn’t play well with
caching
‘. He’s working on gluing all project files together into a large
script that can be cached as a single application binary, and the restriction
to a single namespace per file would kill this admittedly experimental
approach.

Sebastian Bergmann wondered why Stefan didn’t simply expand his namespaces
during the “gluing” stage, leaving the final file with either a single
namespace or none. Stefan pointed out that this would mean parsing and
writing every file in the application; he also wasn’t sure that it would work
out when it came to dynamically creating class and function names. Perhaps
there needed to be a PHP function that would return a fully qualified name
for a given name; giving the Engine control over the expansion would ensure
that the correct rules were used. Sebastian agreed; he believed this
functionality should be added to the Reflection API.

Short version: Somebody did some thinking outside the box. (Kudos to Dmitry!)

TLK: ZTS and Apache prefork

Internals newbie Oliver Block was confused. He’d been advised to compile PHP
--with-maintainer-zts while working on the source. He wanted to
know whether PHP compiled in this way as an Apache module was supposed to
work without problems only on Apache worker, or whether it should work on
Apache prefork too?

Stas believed it might work, but didn’t see why anyone would want to do that
because the ZTS build is so much slower. Scott explained; he’d advised Oliver
to compile it this way so that he’d remember to add thread safety management
code where necessary. Stas agreed this was a good move, but made it clear
that he wouldn’t recommend running a ZTS PHP module with the Apache prefork
MPM for anything other than core/extension testing and development. Apache
developer William A. Rowe pointed out that ZTS gives the option of loading
the PHP module under prefork, worker or even the event MPM, and this leads to
ZTS being the build of choice for many OS bundles. He added, however, that
Stas was right in saying that it loses something in performance with the
prefork MPM. Derick Rethans regarded that performance hit as excessive, and
remarked that if anyone had found a distribution offering a ZTS version of
PHP with Apache prefork they should simply compile their copy of PHP from
source.

Short version: Don’t use this combination for anything other than
testing new extension development.

TLK: unicode.semantics [continued]

Richard Lynch had finally realized what it was that Tomas Kuliavas had been
trying to say throughout this (long, weary) thread. ‘Gak‘, he wrote
expressively. Did Tomas mean that code like:


$mask =
0xf0;
$value = $_POST['foo'] &
$mask;


would break under Unicode mode? ‘That can’t be right…‘ Surely if
nothing new-fangled had been done to turn strings into Unicode they’d just be
ASCII, or assumed to be ASCII by PHP? Richard wasn’t surprised that using new
features from a new major release version would cause parser errors in older
PHP versions – but an old script ought to “just work”.

Matt Wilmas wrote that this particular piece of code shouldn’t break anyway,
given that both values are integers. In fact, he had similar issues with a
number of Tomas’ complaints. The point of this entire thread though was that
when unicode.semantics is switched on PHP 6 strings are Unicode
strings, without anyone having changed a thing, unless they are explicitly
cast to binary. This obviously does lead to different behaviour, and
this was the part Matt thought wrong; he’d prefer the onus to be on the
programmer to explicitly cast strings to Unicode. This way Unicode support
could be always available and completely back compatible too; only code
written specifically to be Unicode-aware would be affected.

Stas also replied to Richard. He explained that the problem with having some
automatic downgrade to ASCII is that UTF-16, which is used internally, is not
compatible with ASCII in the way that UTF-8 is. It wouldn’t work in all
situations.

Johannes Schlüter, getting back on topic, wrote that it’s just as simple
to install two different versions of PHP as to install two versions of PHP 6,
one with and one without Unicode support. The difference was that in PHP 6,
you’d be faced with two incompatible products with the same name, causing far
more difficulties both for hosts and for application developers. Cristian
Rodriguez agreed vehemently; if unicode.semantics can’t be set
at runtime it’s bound to be switched off in most installations because it
will break too much existing code otherwise. He couldn’t see a happy ending
here, and muttered something about this being worse than
magic_quotes and safe_mode. Derick agreed with him,
on the grounds that there’s no way to work around
unicode.semantics=on in userland code. Jani Taskinen explained
that this is exactly why he started this thread in the first place; he
believes the setting causes more problems than it solves. Rasmus Lerdorf
wrote that the real question was whether the team wanted a true Unicode mode
for PHP or not. He went on to say that Matt’s suggestion regarding explicit
control over Unicode support is exactly what is there already when
unicode.semantics=off. This is the mode that ISPs are expected
to run their shared servers in, and is also the mode that portable PHP
scripts should be written for. However, should having this mean there’s no
attempt to go all the way? Where would that leave PHP, five years down the
line? Rasmus ended by noting that the idea was to minimize the issues
enough that the amount of code that… needs to be written twice will be
limited
‘; if this doesn’t work out, the full Unicode attempt will have
failed. At this stage, though, he wasn’t prepared to give up trying.

To Lukas Smith, the bigger question was how best to maintain the “fork” – in
two separate branches (i.e. PHP 5 and PHP 6), or in a single one? He still
believed that the first of those options was the better choice. Tony agreed,
given that Rasmus had more or less said the intention is to offer full
Unicode support to a very limited set of users. ‘You don’t buy a Porsche
if you need a taxi
‘, so why not simply backport the other new features to
PHP 5 and offer both branches, rather than have the switch in one?

Richard Quadling didn’t see why Unicode support couldn’t have been offered as
an extension in the first place. Tony mentioned ext/mbstring, but Stas
was quick to point out that mbstring offers nothing like the depth of
Unicode support offered by the ICU API. More helpfully, Derick explained that
the Zend Engine needs to be able to work with the Unicode library, and this
isn’t something that can be done from an extension. Richard wondered, then,
what the team anticipated for the future of PHP 4, given that PHP 5 hasn’t
really had a widespread uptake and PHP 6 offers functionality needed by only
a subset of users. Stefan Priebsch felt that the features backported to PHP 4
had been instrumental in holding back PHP 5 uptake; he really didn’t want to
see the same thing happen again. Lukas queried this; he didn’t recall
anything being backported to PHP 4, apart from the memory corruption
fix that forced PHP 4.4.0. Stefan made his point anyway, that PHP 6 needs
more selling points than Unicode support. Pierre waved namespaces at him, but
agreed that he’d rather not backport namespace support to PHP 5. Stefan
wondered how namespace support was getting along, and whether it had ever
been intended for PHP 6.0 anyway. Johannes patiently gave him the link to the minutes of the PHP 6
planning meeting
held nearly two years ago and suggested he might read the
other thread(s) running concurrently on the internals list. Tony, though,
didn’t care about keeping namespace support exclusively in PHP 6, assuming
that the PHP “fork” is to be split into two separate branches.

Short version: A split over a split.

RFC: RIP PHP 4?

Derick, having been taken to task over his April 1st announcement about the
demise of PHP 4 at the end of 2007, wanted to gauge whether there was any
consensus for making that announcement for real. He added hastily that
security issues would still be fixed due to the size of the installed base,
but this should be the only thing that warrants a new PHP_4_4 release.

The next nine posts voted for this scenario, but Vesselin Kenashkov broke the
rhythm with the tenth. He thought PHP 4 support should continue until PHP 6 is
released. However, the next four posters didn’t make that distinction between
“when PHP 6 is released” and “the end of the year”, leaving the score at 14 to 1.

Rasmus Lerdorf broke the flow to point out that “security fixes only” has
actually been the extent of support offered for PHP 4 for the last year or
so, and demanded to know what Derick actually intended? To Rasmus, “dropping
support” meant no new releases at all, for any reason, and he didn’t feel the
time was yet ripe for this. Tony responded that “dropping support” to him
would mean closing all PHP 4-only bug reports and advising those users
troubled by them to upgrade to PHP 5. He’d only have releases when there were
critical security issues, and even those should stop some time over the next
couple of years. Stas argued that the security fixes aren’t the only problem;
the user base is such that important non-security fixes should go in too. This
being pretty much the level of PHP 4 support offered currently, he joined
Rasmus in his reluctance to regard it as “dropping support”. Rasmus, though,
was mostly concerned over the possibility of giving mixed messages to PHP
users. He wrote that he’d rather put out a statement now with ‘a final
death date on it
‘ and (apparently randomly) suggested 08/08/08 for that date.

Meanwhile, a further twelve posts voting to drop support for PHP 4 trickled in.

Jani Taskinen thought that simply having an official notice on the php.net
homepage would be enough to let hosting companies and users alike realize
that the end is nigh. He’d rather drop PHP 4 support entirely by the end of
2007 and focus fully on PHP 5 and 6. Stas pointed out that the team already
are focused fully on PHP 5 and 6; there hasn’t been a discussion about
anything PHP 4 related on the internals list in some time, security fixes
apart. He wanted an official phasing-out plan, and expected a full year to be
enough time for everybody to make their move. Derick was by now coming to this
conclusion too; he wanted there to be a clear statement that bug fixes will
end after 2007 and security fixes after (you guessed it) 08/08/08. Marco (not
sure which Marco) suggested that PHP 4 download links should be moved from
their prominent position on php.net to the
PHP museum
at the same time. There was some support for this.

Andi Gutmans, much to Derick’s surprise and delight, agreed with all of it,
and recommended that the proposed changes to the homepage should be put in
place immediately. Jani still held out for dropping PHP 4 support by the end
of the year, pointing out that it would still be available for download –
just not supported. Andi vetoed that, arguing that PHP 4 users would need
time to plan their migration. A year’s notice seemed long enough to him.

Rasmus finally shared the roots of his 08/08/08 fixation with us all. It
seems the Chinese word for 8 sounds like 发, which means “prosper” or
“wealth”; the date is considered lucky in China.

Short version: The writing is on the wall. (And on the php.net homepage.)

RIP: In Memoriam

So. Farewell then
PHP 4.
Key to the Web
Empowerer of the proletariat
Glue of the people.

KISS!

That was your
Catchphrase.
It won you
Many friends.

Keith’s Mum says
She always liked you the best
but me and Keith reckon
It’s time to go live
With 5.

Short version: With apologies to E.J. Thribb (17½).

CVS: User streams

Changes in CVS that you should probably be aware of include:

  • Core bug #41865 was fixed when
    the second parameter in fputcsv() became optional no more [Mehdi
    Achour, Jani]
  • In ext/simplexml, bug
    #41867
    (getName() is broken) was fixed [Rob Richards]
  • Also in ext/simplexml, bug
    #41861
    (getNamespaces() returns namespaces of node’s
    siblings) was fixed [Rob]
  • The pgsql extension now compiles against PostgreSQL versions
    older than 7.4 (5_2 only) [Ilia]
  • In ext/openssl, bug
    #41770
    (SSL: fatal protocol error due to buffer issues) was fixed in the
    PHP_5_2 branch only [Ilia]
  • Basic PDO->quote() functionality was added to PDO_OCI
    [Christopher Jones]
  • The bundled timezone database in ext/date was updated to 2007.6
    (2007f) [Derick]
  • Unicode XML should be working now in ext/simplexml in CVS HEAD [Dmitry]
  • Also in CVS HEAD, ext/pcre gained Unicode/binary support [Dmitry]

In other CVS news, Dmitry added the ability in CVS HEAD to create local or
remote user streams. As discussed a few weeks
ago
, local user streams are not allowed to open URLs if the
allow_url_include INI directive is switched off. As part of his
patch – which was only very loosely based on the most recent of Stas’
offerings – Dmitry introduced a new userland function,
stream_is_local(), making it possible for application developers
to check the status of a given stream. He also extended
stream_wrapper_register() to accept an additional optional
argument, flags. For the present only one flag,
STREAM_IS_URL, has been implemented; it registers the user
stream wrapper as remote.

With this much out of the way, Dmitry wrote to Ilia requesting permission to
backport the patch to the PHP_5_2 branch. He noted that ‘it looks like it
breaks binary compatibility but really it doesn’t
‘, which of course
guaranteed that everyone following the internals list would promptly go
through that code with a magnifying glass. Ilia immediately wanted to know
how Dmitry expected to change the PHP core globals struct without breaking
BC. Dmitry explained that the size of the structure is only important during
allocation in this case. Pierre was suspicious of it too, and wanted to know
if it would cause runtime warnings from his Ubuntu system, which is
apparently quite sensitive to mixed library versions. Rasmus pointed out that
binary compatibility is both backwards and forwards, and would be broken in
this case if any extension tried to access the new core globals element.
Dmitry couldn’t think of a single reason why an extension might want to do
that, but backed down anyway and wrote that the patch could wait for PHP 5.3
– just as Rasmus backed down over the assertion that no extension would use it.

Sean Finney wondered whether the struct size change might cause problems for
third party package maintainers, given that some PHP extensions aren’t
compiled against the local copy. Dmitry thought not, given that the struct is
allocated and initialized in the PHP core. Still, he wrote that he would hold
back the patch for now, and revert it if there were any problems reported
prior to the PHP 5.2.4 release. Ilia seemed happy with this, and agreed to it.

Short version: The security model for user streams will be
backported to the 5_2 branch.

PAT: SNMP and escapeshellarg

Sara Golemon posted an ext/simplexml patch for review. She wrote that
an empty() check on a node with children currently returns
FALSE because SimpleXML doesn’t apply PHP’s ‘emptiness
rules
‘ to their content. David Coallier promptly tested the patch, and
pronounced that it ‘makes perfect sense‘. Nobody else replied on-list;
Sara subsequently committed her fix.

Tony queried SNMP patch author Gustaf Gunnarsson over the timing of his calls
to snmp_free_pdu(), sensing a double free in the making (read:
crash). Gustaf explained that there was a clone involved in the process, and
referred Tony to the SNMP client
and API
sources.

One Tzachi Tager posted a potential solution for bug #40928
(escapeshellarg() does not quote percent (%)
correctly for cmd.exe). He thought the problem was that
escapeshellarg() and escapeshellcmd() under Windows
don’t utilize command line escaping, which does actually exist in the forms of
^ or ". The current code simply replaces special
characters with spaces. He attached a patch to fix the problem. The reporter
of the bug, Frode Moe, suggested that there should probably be a completely
different set of escape functions for the Windows platform to avoid the risk
of changing working functions elsewhere. He also commented that Tzachi’s
patch was ‘difficult to read‘. Tzachi took the latter point and
followed up with a more readable version, but nobody else responded.

And finally, Tony applied a patch to fix Gentoo configuration bug #41908 (CFLAGS="-Os"
./configure --enable-debug
fails). The patch was posted by the bug reporter.

Short version: Tzachi’s patch needs a rethink, Gustaf’s is in PAT pending tests.