Zend Weekly Summaries Issue #349

      1 Comment on Zend Weekly Summaries Issue #349

NEW: Namespace support
TLK: SPL always-on?
TLK: RIP PHP 4 [continued]
TLK: unicode.semantics [continued]
NEW: Ever-ready mail()
TLK: Apache and multiple PHP versions
TLK: Cycle collection
NEW: ICU project
CVS: Local/remote user stream support in 5_2
PAT: LDAP controls, escapeshellarg, ZE stuff

NEW: Namespace support

Dmitry Stogov kicked off this week with one of his famous short messages:

Amazingly, there was virtually no response. Self-confessed lurker Markus
Fischer (oh come on now Markus, we remember you contributing!) wondered, a
few days down the line, if there might be any problems arising from using
paamayim nekudotayim (::) as the class separator? Stas Malyshev
put his mind at rest, explaining that since namespace shortcut names are
explicitly imported any ambiguity can always be resolved at compile time. And
that was it.

Of course, there did happen to be another namespace thread running at
the time. Some of it was even focused on Dmitry’s patch. Stefan Priebsch, for
example, held out for his multiple namespaces in a single file at first, but
eventually backed down in the face of a solid wall of advice from the elders
of the team. A few people were less circumspect and argued on, but no more
came of that line of enquiry.

David Coallier’s request for curly brackets to encapsulate a namespaced group
won a few fans among the users too, but absolutely none among the core
development team. Stas expressed the thoughts of many when he wrote that
encouraging PHP users to pollute the global space with non-namespaced
function names would render the whole concept of namespacing completely
pointless. Eventually the complaints died down, and everybody following
internals@ now knows that it’s best to keep their utility functions separate
from their namespaced classes. (Don’t they?)

Derick Rethans asked for clarification about the positioning of the namespace
declaration statement in a file. Dmitry had written that it should be ‘the
very first statement
‘, but that position is already reserved in PHP 6 for
the pragma() statement used to set the file encoding. Andrei
Zmievski had also picked up on this, and wrote that he’d already recommended
using the declare() statement for namespace declaration,
off-list. Dmitry replied, on-list, that the encoding statement could go
either before or directly after the namespace declaration.

With all the battles seemingly behind him, Dmitry committed the namespace
support patch into CVS HEAD mid-week as planned.

Short version: It finally got there.

TLK: SPL always-on?

The talk of __autoload() usage in the secondary ‘namespace
support’ thread led Stas to observe that ‘spl_autoload_register() works
much better for complicated libraries
‘. Stefan agreed to the extent that
he suggested removing __autoload() completely from PHP 6 and
making it impossible to disable ext/spl. Stas didn’t hate
__autoload() quite that much; he just felt it shouldn’t be used
in modular applications. There was a brief spat between the OO fans and the
procedural fans over the usefulness of SPL. This was followed by another
brief spat, this time between those who believe the application developer
should have responsibility for autoloading behaviour and those who believe it
should be in the hands of the language developer.

When the dust settled, Stefan pointed out that SPL is part of the PHP core in
the eyes of many because the manual (and peripheral documentation) says it is.
Jani Taskinen argued that the manual is simply wrong on this point; if you can disable
it, it isn’t part of the core. Stefan’s point had been that it should be part
of the core precisely because the manual claims it is. He – and many other PHP
users – had assumed that SPL support could always be relied upon in recent
versions of PHP and designed their applications accordingly.

Short version: Pulling the rug from under the feet can cause
distress in humans.

TLK: RIP PHP 4 [continued]

This thread seems to have taken up a huge chunk of my inbox, but it didn’t
actually have a lot of content. After around a hundred emails (that is this
week’s portion of the thread taken alone) there was a draft announcement for
php.net in place and Rasmus Lerdorf’s suggestion of 08/08/08 for the cutoff date
had been agreed. Someone (Alain Williams?) suggested adding an extra error
reporting bit in PHP 4 to clarify the areas of PHP 4 code that will be broken
in PHP 5. Most of the other 97 or so posts were either about easing the
upgrade or about marketing PHP 5.

Marco (I’m uncertain which Marco, since he gave no surname) was keen to start
a new thread to recruit volunteers to increase the knowledge base about
upgrading from PHP 4 to 5. Jani suggested that he do it somewhere other than
the internal development list, and the thread ended abruptly as everyone
realized they’d been off-topic all along.

Short version: etc.

TLK: unicode.semantics [continued]

Andrei spent some time answering points raised during the last week or so. To
Tomas Kuliavas, he explained that you cannot sanely work with bytes inside
Unicode strings. Either unicode.semantics would need to be
switched off or the binary cast would need to be used on the
strings. To pretty much everyone else, Andrei argued that backporting
non-Unicode features from PHP 6 to PHP 5 would just slow down PHP 6 adoption.
It would be better to get the entire PHP userbase into the idea of using
Unicode natively. Tony Dovgal disagreed; speaking as an internals developer,
he would much rather see a clean and straightforward PHP 6, without any
concern for back compatibility, than have to maintain code resulting from
something he sees as a bad design decision. Nicolas Bérard-Nault
backed him, citing the problems of the system-wide
unicode.semantics switch from a userspace perspective – not
least among them being that to work around it, ‘for the first time I’m
forced to explicitly specify a variable type in PHP
‘.

Christopher Jones agreed with Andrei that features shouldn’t be backported
from PHP 6, but for entirely different reasons; he felt that doing so would
complicate PHP 5 unnecessarily.

Stas pointed out to Tony that there would need to be internal support for
both Unicode strings and unstructured bit stream data anyway, regardless of
the switch. Johannes Schlüter argued that the majority of internal
structures offer support for one or the other depending on a configuration
option. It would be just as simple to run PHP 5 and PHP 6 together on a
server as it would to run two differently-configured versions of PHP 6.
Having two products with the same name seemed a bad idea to him.

Andi Gutmans proposed pulling together a PHP Compatibility Team to find and
document the explicit issues found when porting applications from earlier
versions of PHP to PHP 6, both with and without Unicode support. In some
cases, having that information at an early stage would mean the development
team could avoid breaking BC. To Andi, the current discussion was premature
and therefore doomed to focus on ‘purity’ aspects rather than actual issues.
Evert|Rooftop immediately volunteered his services and rather unfortunately
asked for the top ten PHP applications, which of course prompted another
flood of mail from anyone wanting the world to know their favourites.
Johannes noted that it might be a better idea to fix the existing test suite
failures first.

Tomas was of course able to report on SquirrelMail’s PHP 6 migration issues
immediately. For the PHP 4+ compatible software, he’d needed to remove calls
to session_unregister() and get_magic_quotes_gpc()
and ensure that unicode.semantics was switched off. Turning it
on, however, caused huge problems.

Sebastian Mendel was also able to report on phpMyAdmin. It ran without any
changes (this presumably when unicode.semantics=off?) apart from
a lot of E_STRICT and E_NOTICE error messages, all
of which also exist when the software is run under PHP 5. Losing the need for
PHP 4 compatibility would resolve that issue.

Derick posted a fairly vague note saying that eZ components hadn’t seemed too
bad under PHP 6 ‘a couple of months ago‘ but was struggling now. He had
yet to investigate the reasons for this.

Uwe Schindler came into the discussion late to suggest to Tomas that he might
release a version of SquirrelMail with actual PHP 6/Unicode support, not least
because the i18n support currently offered by the software is buggy. Tomas
replied that he is no longer a member of the SquirrelMail development team;
he uses a modified version of the software that has fewer i18n bugs than the
official release(!).

Short version: The maintenance issues with PHP 6 are largely
internal rather than at userspace level.

NEW: Ever-ready mail()

Johannes didn’t see why PHP’s mail() function
should be disabled if the sendmail binary can’t be found, and posted a
patch
to make the function always available. Jani asked sharply
what would happen if the binary didn’t exist on the system, but Johannes
explained that the function would simply return FALSE rather
than throw an error. Stas backed Johannes’ approach, and Cristian Rodriguez
wrote that he’d always wondered why it didn’t do that in the first place.

Jani raised the possibility that the path to the binary might simply be
wrong. Johannes mentioned the sendmail_path INI directive and
Jani backed down; he’d forgotten it exists. He contented himself with
recommending that HAVE_SENDMAIL should be obliterated, since
Johannes’ patch would render it obsolete. Johannes subsequently committed the
change to CVS HEAD and the PHP_5_2 branch.

Short version: Checking the return value of mail() should offer
some protection in all cases now.

TLK: Apache and multiple PHP versions

Tijnema came up with the idea of having a comment line ‘something like a
shebang line
‘ that would tell the Apache handler which version of PHP to
load. He wasn’t sure if this was even possible, but wrote that he’d be
prepared to do the grunt work if it was. Jani pointed out that it would be
much simpler to install Apache as FastCGI and use the file suffix to define
the PHP version, since this works already. Tijnema mentioned the words
‘portable code’, but Johannes mentioned the words ‘.htaccess file’ in
retaliation.

Guilherme Blanco popped up with a link to a GPL’d PHP
class
that claims to make classes written in PHP fully version-agnostic
using a similar system to that proposed by Tijnema, but everyone ignored him.
Richard Lynch wrote that the real issue was preventing two PHP modules from
tromping on each others’ symbols‘; it used to be possible to load PHP
3 and PHP 4 Apache modules together, but this had been an anomaly. Someone
named “chris#” (<sigh />) asked mischievously which developer had broken that
feature in PHP 5? Jani – who coincidentally had removed the
--enable-versioning config option earlier in the week because it
couldn’t work but could cause crashes – wrote that only way to support
multi-versioning in Apache/PHP 5 would be to patch the Apache SAPI sources.
Rasmus backed him, adding that it had never worked cross-platform
even in PHP 4 due to a reliance on system-specific behaviour. “chris#”
promptly offered the dev team access to a dedicated server while they figured
out a good way to support multiple versions of PHP as Apache modules in
parallel. He must have missed the second part of Rasmus’ post, which
explained that open source development relies on the passion and dedication
of the developer. If nobody cares enough about a task to spend time on it,
the job won’t get done.

Meanwhile Tijnema had come up with a simple script that could be added to the
apache2handler SAPI, allowing it to load multiple PHP modules. Stas pointed
out that the various Zend Engines would need to initialize and shut down on
every request, rather like in CGI, and asked a little wearily why Tijnema
couldn’t just use FastCGI as advised? The ideas grew a little wilder, then a
lot wilder. Rasmus eventually cracked and suggested that having some idea of
how PHP internals work might be useful when posting to a mailing list focused
on its development. Tijnema apologized, saying that he hadn’t understood the
purpose of the list but would now study the available documentation. Nobody
quite had the heart to tell him there isn’t any.

Short version: Noise, mostly.

TLK: Cycle collection

David Wang is a Google Summer of
Code
student whose brief is to implement a garbage collector for circular
references in PHP. He posted a
mid-term status report
on internals@. It seems the initial phase of the
project is now complete, and David is onto the profiling, optimizing and
debugging part. His initial benchmarks showed a massive reduction in memory
usage – well over 50% – but with some performance impact – including a
doubling of the execution time when it came to templating. David ended his
post with an open invitation to download and review his code.

Sebastian Bergmann did so, and made some suggestions for optimization and
code clarity. Stas looked into the results, and was surprised at the extent
of the memory usage reduction. He wondered whether it would be possible to
optimize performance. David felt it probably was, but added that his current
priority was to reduce overhead for acyclic programs; he was seeing a 5%
slowdown in Zend/bench.php. Scott McVicar suggested making it possible
to start and stop garbage collection from within a PHP script to address this,
but David felt that was pretty much taken care of internally. Andi was
less sure, and asked a number
of technical questions
about the implementation before requesting a patch
against the baseline. David duly provided one, along
with a couple of new files introducing cycle
collection into the Zend Engine.

Derick had meanwhile been doing some testing of his own, and noted that he
obtained even better results on his system than those David had reported. It
was Rasmus who ended the party. He wrote that it’s still so rare for PHP code
to have cyclic references that he didn’t believe David’s patch would make any
difference to real-world applications.

Short version: Sounds good; needs real-world testing.

NEW: ICU project

Stas announced a new PECL project aimed at making it easier to support
international markets when using either PHP 5 (alongside UTF-8 encoding) or
PHP 6. The base for the extension is the ICU library, which is already used
in PHP 6, with the intention being to follow the ICU model. The extension will
be made up of largely independent modules, each of which will implement one
of: collation, number
formatting
, date/time
formatting
, locales, calendars, international
domain names
, message
formatting
and resource
bundles
. There are initial implementations of the collation and number
formatting APIs for PHP 5, which will be publicly available in the near
future.

Most importantly, the project aims to ensure that any PHP 5 code using
pecl/intl functions will work in exactly the same way in PHP 6,
although the PHP 6 version may provide additional functionality. The
development team had decided that support for internationalization services
in PHP is needed sooner rather than later.

The mailing list to go to for discussion about the extension’s development is
the PHP Internationalization list at php-i18n@lists.php.net. LiveNation,
Yahoo! and Zend Technologies are backing the project.

Misunderstandings apart, the only feedback to this cheering news came from
Wikimedia developer Tim Starling, who requested that normalization be added
to that module list.

Short version: Forward compatible ICU functionality for PHP 5.

CVS: Local/remote user stream support in 5_2

Changes in CVS that you should probably be aware of include:

  • Zend Engine bug #41919 (crash in string to array conversion) was fixed [Ilia, Dmitry]
  • In ext/pdo_odbc, bug #41870 (PDO_ODBC module linking fails with iODBC) was fixed [Jani]
  • In ext/pdo_pgsql, bug #35981 (pdo-pgsql should not use pkg-config when not present) was fixed [Jani]
  • In ext/simplexml, bug #41947 (SimpleXML incorrectly registers empty strings as namespaces) was fixed [Rob Richards]
  • ReflectionClass::getDefaultProperties() now handles static attributes, closing feature request #41884 [Tony]
  • Sascha Schumann’s fix for bug #41815 (Concurrent read/write fails when EOF is reached) was backported to PHP_5_2 branch [Jani]
  • In the date extension, bugs #41964 (strtotime returns a timestamp for non-time string of pattern '(A|a) .+'), #41844 (Format returns incorrect number of digits for negative years -0001 to -0999), #41842 (Cannot create years < 0100 & negative years with date_create or new DateTime), #41709 (strtotime() does not handle 00.00.0000) and #41523 (strtotime('0000-00-00 00:00:00') is parsed as 1999-11-30) were fixed [Derick]
  • In ext/ldap, bugs #39291 (ldap_sasl_bind() misses the sasl_authc_id parameter) and #41127 (Memory leak in ldap_{first|next}_attribute functions) were fixed [Jani]
  • In the SOAP extension, bug #41635 (SoapServer and zlib.output_compression with FastCGI result in major slowdown) was fixed [Dmitry]

In other CVS news, Dmitry backported support for creating local or remote
user streams to the PHP_5_2 branch. Local user streams – the default is local
– cannot open() URLs if allow_url_include=off, and
there is a new function, stream_is_local(), which allows users
to check the stream type. As part of the same set of changes,
stream_wrapper_register() now has the additional optional
argument flags, which currently supports only one flag,
STREAM_IS_URL. This flag is used to signify that the userstream
wrapper is remote.

Short version: Another weapon in the battle for security in PHP
applications.

PAT: LDAP controls, escapeshellarg, ZE stuff

Having noticed that the LDAP extension now has a dedicated maintainer in the
shape of Douglas Goldstein, Ignacio Arenaza posted
some code
. This was the ancient LDAP control patches written by Pierangelo
Maserati (in PAT forever), updated to apply cleanly to PHP 5.2.3 and
accompanied by some tests. Ignacio had tested the patch using the OpenLDAP
2.2.23 libraries under both Debian Sarge (with the OpenLDAP 2.2.23 slapd
server) and Windows 2003 (with MS Active Directory). On both servers, where
the specific control is supported the tests pass.

Douglas himself appears to have missed that mail.

Stas commented on Tzachi Tager’s patch to fix the
escapeshellarg() Windows bug #40928 last week. He felt that Tzachi
had probably misunderstood Microsoft’s documentation, but Tzachi showed him
the results of his own tests using the PHP function. The jury’s still out on
that one.

Apropos of the earlier discussion about cycle collection, David Wang offered
a Zend Engine patch
to convert access to zval.refcount, zval.is_ref and
_object_store.refcount to macros. None of this would have any
impact on functionality; it simply makes it easier to implement garbage
collection. Jani, Stas and Tony all called him out (nicely) over the
lack of adherence to PHP internals coding conventions and a certain lack of
clarity in naming. David – an equally busy person – didn’t press it further
at this stage.

Dmitry applied a patch from robin_fernandes at uk dot ibm dot com to fix Zend
Engine bug #41961 (Ensure search for hidden private methods does not stray
from class hierarchy).

Sara Golemon posted a patch for review ‘which exports an internals hook in
zend_class_entry for fetching function pointers similar to the object hook
get_method() available to instance methods.
‘ The patch also exports a
userspace hook, __call_static(); a version of
__call() for static calls. Sara hoped to apply the patch later
in the week.

Jani and Sebastian promptly called for the patch to be applied in the PHP_5_2
branch as well as in CVS HEAD. Stas pointed out that the patch breaks binary
compatibility and so will need to wait for PHP_5_3. Thomas Moenicke was keen
enough to review the code, and alerted Sara to an unnecessary function call.

And finally, Gwynne Raskind offered a patch that would
give PHP ‘a heredoc that acts like a single-quoted string.‘ Calling it
‘nowdoc’, Gwynne’s idea was to trigger ‘nowdoc’ strings using
<<<~ to distinguish them from heredoc (<<<)
strings.

Short version: Support for static calls, ‘nowdoc’, and some
long-awaited consensus over the functionality needed in ext/ldap.