Zend Weekly Summaries Issue #358

      Comments Off on Zend Weekly Summaries Issue #358

TLK: PHP 5.3 feature poll
TLK: INI parser caching
TLK: Phar agonistes
TLK: Constant folding optimization [continued]
FIX: run-tests.php
PAT: array_get
RFC: Multiple namespaces per file
TLK: Accessing global namespace
FIX: mail.force_extra_parameters
RFC: Marking functions as const
CVS: Namespace/autoload problem resolved
PAT: LSB, GC discussions

9th September – 15th September 2007

TLK: PHP 5.3 feature poll

Release Master hat firmly in place, Ilia Alshanetsky opened a new thread to
discuss the shaping of PHP 5.3. Since his request for feature suggestions a couple of weeks ago,
he’d collected ‘a substantial list of key changes‘ that people would
like to see in the 5.3.0 release. Ilia therefore asked all interested parties
to vote on the following items:

  • Namespace support
  • pecl/intl as a core extension, disabled by default (external library)
  • Late static binding
  • David Wang’s circular garbage collection patch
  • Support for SQLite 3 in ext/sqlite
  • Removal of safe_mode, register_globals and magic_quotes
  • mysqlnd in the core, used as a backend for ext/pdo_mysql and ext/mysqli and possibly enabled by default
  • Support for OpenID in ext/openssl
  • New array functions, array_replace[_recursive]()
  • E_DEPRECATED
  • The zend_arg_info const‘ify patch
  • The GCC 4 -fvisibility patch
  • A switch for disabling/enabling materialized cursors in ext/mysqli
  • pecl/phar as a core extension (possibly enabled by default)
  • Matt Wilma’s ZEND_SIGNED_MULTIPLY_LONG() optimization
  • New php.ini parser/scanner and CGI/FastCGI .htaccess-style INI support
  • __callStatic()
  • “strict classes” that do not permit dynamic property creation

We all did our best to oblige but ‘I am not qualified to offer an
intelligent opinion
‘ (as Larry Garfield had it) was the only honest
response most of us could give in some cases.

Surprisingly few of the inevitable side issues went as far as a sub-thread;
most people just voted as requested. Lester Caine went a little crazy when he saw
magic_quotes and friends on the list and had to be calmed down,
although most figured that that particular suggestion wouldn’t get through
even if everyone voted for it. Rob Richards wanted to know which OpenID
implementation was under discussion (Dmitry Stogov’s). Most voters didn’t
want to see the relatively untested mysqlnd library or garbage
collector enabled by default in 5.3.0, but were happy to have both in the PHP
core. Cristian Rodriguez, although being all for -fvisibility
support, wrote that the patch to implement it doesn’t actually work on his
box. I read the “strict classes” option the wrong way, but was rescued by Mike Wallner,
who realized. Stas Malyshev, who didn’t quite get the point of
E_DEPRECATED, listed a bunch of items that were missing from
Ilia’s list: dynamic class access, the (binary) operator,
always-enabled FastCGI, nowdocs, goto,
__construct() in interfaces and constant folding at
compile-time. François Laupretre seized the opportunity to have a pop
at the decision process, which
sounded like sour grapes
to Marcus Börger.

Short version: Unusual restraint all round.

TLK: INI parser caching

Andi Gutmans wasn’t too sure about that proposal for .htaccess-style
INI support. He wrote that it should be optional, if it goes in at all,
because the stat() calls involved would hit performance quite
hard. Pierre-Alain Joye informed him that the plan was to cache the entries
and have a single call to stat() when TTL is reached. Andi was a
little happier, but asked that the implementation be generic in TSRM so that
it could be used across PHP. Jani Taskinen pointed out that actually his
patch caches already. However, he’d skipped calling stat() at
all; his implementation just re-scans existing files when the cached results
expire, since he’d thought it best to keep things simple. Pierre looked again
at Jani’s patch and agreed with him; it would be easy to maintain, work well
and keep the performance impact to a minimum.

Alexey Zakhlestin thought there might be a better approach. He wrote that
some operating systems – including the BSD-based systems, Linux and Windows –
allow subscription to notifications from the filesystem. Stas saw where Alexey
was going with this, and wondered aloud how expensive those notifications
would be. Pierre clarified a few points about Jani’s solution; the INI cache
works on a per-directory basis, caching the entries found in the current and
parent directories. The TTL that controls the checking is configurable; to
illustrate, the htscanner tool has it set to 5 minutes by
default. Alexey pointed out that using kqueue and friends would
mean the TTL part could be skipped altogether. PHP would simply register
itself as a subscriber to filesystem events in the relevant directories, and
could then react immediately to any changes in them. He agreed with Stas,
though, that the cost of filesystem notifications should be looked into.

Pierre wasn’t certain, but believed that there would need to be a local
session manager for each connection/registration. Besides, how would the
notifications be sent to PHP? All things considered, there were advantages to
the TTL system, and not least among them were simplicity and portability.

Short version: It all depends on who you think is the boss.

TLK: Phar agonistes

Another sub-thread from the feature voting was sparked when Andi voted
against pecl/phar going into the core. Andi reiterated his past
argument, that he’d prefer a standard format that could be manipulated with
standard tools, and added a new one; the Zend team’s testing had shown that
the TAR format gives better performance. That said, he wasn’t clear about
phar‘s use case, and didn’t think Web applications were its target.
Dmitry backed Andi, writing that he’d also prefer ‘something more toolable
(like TAR or ZIP)
‘. Greg Beaver, whose tests had shown just the opposite,
wanted to know exactly how they’d benchmarked performance, but no immediate
response was forthcoming.

Greg therefore sat down and wrote
quite a long email
defending his baby. Phar isn’t just an installation
stub; it works quite well in a Web environment and, due to the lack of
stat() calls needed, it’s even possible that applications
running directly from a phar archive will eventually be faster than
their filesystem equivalent when used alongside APC. Re-implementing
phar as JAR would require changes to the Zend Engine because it uses
the ZIP file format; re-implementing it as TAR would require restructuring
the file for performance reasons, rendering third-party tools useless. Long
story short, phar was implemented in the way it was for sound reasons.

Furthermore, there is a cross-platform tool, phar.phar, which is
provided as part of the phar extension and which, although not yet
feature complete, is capable of viewing, extracting and adding files to a
phar archive. Greg would be happy to look at other format
implementations, but he’d first need to hear some compelling arguments for
change; preferably, arguments that would take into account the work already
completed.

Andi clarified. He hadn’t intended to offend Greg, he’d simply expressed his
opinion. Perhaps Dmitry could send Greg something that would explain why the
Zend team achieved better results with TAR? Not that Andi was ‘married to
TAR
‘; any other familiar archiving format would be just as viable.
Outside of anything else, the introduction of a new format left several
unanswered questions about deployment. Where could he find more information?
Would mod_rewrite be needed? How would phar work under Windows?

Marcus pointed out that, as far as ‘toolable’ archive formats go, PHP already
has ext/zip in the core; he, for one, wouldn’t mind if someone provided
a working TAR implementation there too. The purpose of phar was very
different. Marcus wondered, too, how the Zend team had obtained their test
results; he suspected they must have used an old version of phar.
Dmitry heightened that impression when he posted a TAR format variant of phar, in
response to Andi’s earlier request. He’d written this himself as a proof of
concept and had used it in the Zend team’s tests, ‘some time ago‘. He
was, however, unable to confirm precisely which version of phar had
been used in their benchmarks.

François, of all people, stoutly defended Greg’s work. He explained
that, in both pecl/phar and his own project, PHK, the TAR format had
been rejected because TAR access is sequential. If the Zend team could
prove that it was faster to reach the 100th file of a 1Mb archive using TAR,
he for one would be prepared to consider rewriting his project to use it.
That said, the files would then need to be precisely organized within the
archive, which would incur a performance hit. The only option would be to
lose the stub file. Losing the stub would also mean losing certain features.
For both phar and PHK, one such feature would be the independence from
external libraries; another would be the ability to directly include an
archive file from a PHP script.

As if that weren’t enough, re-implementing either phar or PHK to use
the TAR format would result in ‘‘pseudo-tar’ files‘; archives that
could be viewed and extracted using tar itself, but which could
be generated only through the extension. Files generated using
tar would appear to be the same, but would not work with
either extension…

Short version: Greg finally gets François on-side – but is it too late?

TLK: Constant folding optimization [continued]

Following on from last
week’s discussion
with Stas, Nuno Lopes had tried implementing support for
“constant expressions within constant contexts.” He reported that it was much
more complicated than he’d expected; his modification to the
static_scalar grammar rule had produced hundreds of
reduce/shift ambiguities. Ignoring the compiler
warnings led to a version of PHP that couldn’t process PHP scripts…

Dmitry went back to the initial proposal, and spotted a problem with the
concept. If constant expressions were to be supported, constants within
expressions should also be supported – but the values of those constants may
be unknown at compile-time. Delaying constant initialization for expressions
would entail keeping an Abstract Syntax Tree, which would need to be
evaluated at run-time by zval_update_constant().

Stas didn’t see why constants should be supported in expressions across the
board, but Dmitry pointed out that it doesn’t make a lot of sense to support
constant expressions and not support constants inside them. Marcus backed
Dmitry’s observations; he’d looked into this some time ago, and had found
that it isn’t a trivial challenge. All too often in PHP there are constants
that would be enumerated flags in other languages, e.g.


class Week {
    const
Monday = 0;
    const
Tuesday = Monday + 1;
    
// ...
}


or:


class Logging {
    const
INFO = 0;
    const
WARN = INFO + 1;
    const
FAIL = WARN + 1;
    
// ...
}


However, Marcus would like to see support for it, if it proved possible;
it would make my enum implementation in pecl/spl_types more handy‘.

Short version: So that’s what they’re talking about!

FIX: run-tests.php

Zoe Slattery wrote to the QA list with a problem. The following test script
would pass:


--TEST--
Testing regex matching in run-tests.phpt
--FILE--
<?php
echo "Warning: something wrong in function red at line 10\n";
echo
"Warning: something wrong in function
green at line 13\n"
;
echo
"Write whatever you like, it will be swallowed at line 16\n";
?>
--EXPECTF--
Warning: something wrong in function %s at line %d
Warning: something wrong in function %s at line %d

The only way to make the test respond as expected was to put the actual line
numbers in the output rather than using %d. Should Zoe now
recommend that tests be written with the actual line number as expected
output?

Marcus explained why not, but added, oddly, that he’d come across the issue
before. It took an attempted fix from Zoe involving extra comment lines
before he realized that %s was actually catching new lines and
the test suite was in fact broken, at which point he copied the exchange to
internals@. Zoe made it clear that regex isn’t her strongest point, and
Marcus offered her [^\n]+? as a potential solution. It seemed to
work, except that several tests now failed that didn’t fail before.

Nuno Lopes intervened at this point to propose a patch, explaining that it
wasn’t actually possible to fix the current regex for %s without
breaking everything in sight. His patch offered the choice of %s
as the result of matching [^\r\n]+, or %a as the
result of a match for .+. Applying it would mean that some
EXPECTF strings needed altering from %s to
%a, but the more important thing was that when testing the patch
Nuno had found test scripts with the wrong expected output. He was therefore
keen to get the fix committed.

Johannes Schlüter tested Nuno’s patch and was prepared to back it. Zoe,
however, was still hopeful of getting the problem fixed without having to
rewrite half the PHP test suite, and offered a different solution. She
believed that replacing


if (preg_match("/^$wanted_re$/s", $output))

with


if (preg_match("/^$wanted_re$", $output))

would work, since that /s modifier is responsible for the
greediness of the match. Then she looked again, and withdrew her patch.

Marcus wasn’t sure that Nuno’s \r should be in there at all –
wouldn’t it cause problems to disallow it? – but Nuno mentioned Old Macs.
Zoe, though, reported seeing PCRE compilation warnings when she tried Nuno’s
fix. It took her some minutes to realize these were coming from unmodified
test scripts.

Short version: If you’re writing scripts for the PHP test suite,
take note.

PAT: array_get

Andrew Shearer turned up with a
patch
against CVS HEAD implementing the array_get() function
he proposed a couple of
months ago
, test
script
and all. He added that someone had independently posted a similar
idea as a feature request for PHP 5,
and his implementation could be backported to fulfill that. The original
specification from the feature request made up the rest of his email.

Tony Dovgal wanted to know what the difference was, if any, between Andrew’s
proposal and:

<?php

function array_get(&$array, $key, $default) {
    if (isset(
$array[$key])) {
        return
$array[$key];
    }
    return
$default;
}

?>

Andrew explained that the main idea was to enable people to write cleaner PHP
code, since this is – or should be – an oft-used snippet. However, there were
some technical differences in array_get(); he referred Tony to
the spec. Marcus didn’t like it much either; he grumbled that it was
inflexible, and ‘far away from what ifsetor was meant to be‘. It
couldn’t check whether the array exists; it couldn’t manage multilevel
queries; it couldn’t cope with other types in queries, such as objects.
ifsetor() could do all these things. There wasn’t the potential
to return a writeable reference so that a non-existing key could be created.
ifsetor() has that potential. Finally, Marcus disliked the idea
of passing the array to the function by reference; it would be very slow, and
was unnecessary here.

Andrew agreed with that last point; this wasn’t part of his function, just
part of someone else’s feature request. He’d love to see a workable proposal
for ifsetor(), but since there wasn’t one,
array_get() could help with many of its common use cases. You
could, for example, separate the existence checks for array and key with:


$value = isset($array) ? array_get($array, 'mykey') : FALSE;

or use nested calls to array_get() to achieve multilevel
queries. Contrary to Marcus’ earlier claim, array_get() in fact
has support for object members. The writeable reference returned by
ifsetor() would be great, if ifsetor() existed, but
he reminded Marcus that there has been no concrete proposal for it. Marcus
pointed him to the ?: shortcut he put into CVS HEAD as a halfway
house for ifsetor(). Andrew had already investigated it, and
explained that it doesn’t serve the same purpose because it throws an
E_NOTICE for missing values. The shortcut could, however, be
used together with array_get() to overcome the limitations of
both:


array_get($_GET, 'foo') ?: slowDefaultCalculation();

Marcus suggested using @ to silence the E_NOTICE.
Robert Cummings pointed out that this would be an expensive assignment; he
thought the whole point of ifsetor() was to kill undefined index
notices for the left operand. Marcus, however, believed this was the best that
could be hoped for in PHP 5.3, although there is still the possibility of a
real ifsetor() in PHP 6 if the team ever come to a consensus
over it. This came as a surprise to Andrew, who had been under the impression
that ifsetor() was rejected long ago. Still,
array_get() could provide the functionality that was most often
needed, while avoiding the issues that prevented acceptance of
ifsetor() – one of which was backward compatibility, since
there’s no way to write a userland version of it. Writing a BC function to
replace array_get(), on the other hand, was a simple matter:


if (!function_exists('array_get')) {
    function
array_get($arr, $key, $default = false) {
        if (
array_key_exists($key, $arr)) {
            return
$arr[$key];
        } else {
            return
$default;
        }
    }
}

Of course, accepting array_get() now would not preclude bringing
up ifsetor() again in the future…

Lukas Smith believed it would. He didn’t see that anything material was
actually preventing an ifsetor() implementation; he uses the
kind of functionality it could offer on a daily basis, and as such would
prefer it to be an operator and available in PHP 5.3. Marcus’ shortcut
wasn’t helpful to him; Andrew’s solution was close enough to solve his real
world needs, and ‘if I could just see the slightest bit of a real argument
against ifsetor(), I might even vote for array_get()
‘, despite his dislike
of the name.

Marcus didn’t have a new name for it. However, he felt that, if
array_get() were to go into the core as a halfway solution it
should be made more useful.

Short version: Not the Holy Grail, but a shiny cup nonetheless.

RFC: Multiple namespaces per file

Following community feedback over the namespace implementation, Stas wrote,
he and Dmitry had tried to find a simple model that would allow multiple
namespaces per file, and would allow multiple namespaced files to be bundled
together without modifications. They’d arrived at the conclusion that it was
possible, but only if a file containing namespaces contains no code other
than namespaced code. For example,

class X {}
namespace A;
class
Y {}


or

require 'foo/bar.php';
namespace A;
class
X {}


wouldn’t work, but

namespace A;
class
X {}
namespace B;
class
Y {}


or

namespace A;
require
'foo/bar.php';
class
X {}
namespace B;
class
Y {}


would be fine. The question was, would this be an acceptable solution for
those who had wanted multiple namespaces per file? Just to be safe, Stas
added a postscript: ‘This is *not* a “should we use braces” thread, so
please don’t :)
‘.

David Coallier was the first to respond. He wanted to know if it would be
possible to have something like an endnamespaces keyword that
would act as a delimiter and close each namespace definition. Stas
immediately spotted that this was in fact the braces argument without the
braces. He reiterated the main problem with it: when several namespaced files
are combined, global spaced code in amongst the namespaced code starts to look
and act in a ‘seriously weird‘ way. David backed down, and agreed that
the solution on offer would actually work well for him. However, he’d like to
know whether somefile.php would now be included in the B
namespace in the following scenario:

<?php

namespace A;
require
'somefile.php';
class
AX{}

namespace B;
class
BX{ }

?>

Stas pointed out that somefile.php might well contain a namespace
definition of its own; it would be parsed as an entirely separate entity.
Besides, require is evaluated at run-time, so cannot influence
compile-time namespaces. The namespaces and the included file can’t know
about each other.

Marcus felt that the whole thing looked ‘very messy‘; he’d prefer to
either go with the curly braces or stay with the single namespace declaration
at the head of a file. This implementation lacked clarity, and even if the
maze of namespaces panned out as expected you’d need an IDE to get through
them.

On the other hand Lukas, who had been one of those requesting multiple NS per
file, believed the limitations would be acceptable.

Dmitry explained to Marcus that you can actually spread one namespace across
several files, both in the current implementation and in the proposed
multiple namespace implementation. Greg blamed Stas’ initial explanation for
the lack of clarity. The goal wasn’t to encourage ‘a development
paradigm
‘ of multiple namespaces in the same file, but to make it
possible to have multiple files containing namespaces combined into a single
file. Greg found the proposed syntax difficult to read, but saw this as a
good thing; it would discourage development along those wrong lines. If
everyone viewed the proposal this way, they’d see it made much more sense;
It’s basically one-namespace-per-file, but PHP allows you to virtually
combine files so that at each namespace declaration, the imports are
reset.

Greg was too late to persuade Larry Garfield, who hated the syntax because it
doesn’t follow the structure of the parser. Larry agreed, however, that mixing
namespaced code with global code would be ‘all kinds of confusing‘,
however it was managed.

Short version: This is starting to sound like the goto discussions.

TLK: Accessing global namespace

Richard Quadling was confused about global namespaces. If you were operating
within a given namespace, and an entity within that namespace happened to
share its name with some entity in the global namespace, you wouldn’t need to
use the namespace prefix for the former. How, then, would you access the
entity with the same name existing in the global namespace?

Emil Ivanov believed there was a null prefix. To differentiate between
namespaced and global instances of, say:


classB->method_c()

from within a namespace, you’d need to reference:


::classB->method_c()

in order to access the one in global scope.

Marcus concurred, explaining that this solution a) means no new keyword is
needed and b) is in line with other languages.

Short version: It’s easy when you know how.

FIX: mail.force_extra_parameters

Stas had a proposal to disallow setting
mail.force_extra_parameters from .htaccess. He reasoned
that the directive allows arbitrary arguments to be passed to the mail
binary, and some mail tools will take parameters that allow the
reading and writing of arbitrary files. Stas believed that
mail.force_extra_parameters should only be altered by the
systems administrator in any case, so he didn’t see a problem with removing a
way to override the system settings. Did anyone have any objections?

There were no dissenting voices, and Stas committed the change into CVS HEAD
and PHP_5_2 at the end of the week.

Short version: Another security hole bites the dust.

RFC: Marking functions as const

Following the demise of his patch for constant folding optimization last week,
Nuno came up with a new proposal. Some functions, when fed with constant
arguments, will always return a constant value – for example,


strlen('abcd') === 4;


In such cases, it would be possible for an optimizer to do the
transformation. In fact, he believed that Ilia already had a list of such
functions in his own optimizer. Nuno thought it would be better to have that
list in the PHP core, so that everyone could benefit from it. All that would
be required would be a change from PHP_FE() to
PHP_CONST_FE() in the function tables for the affected
functions.

Nuno supplied a link
to his new patch
, which also contains a few function entries that have
already been changed.

Hartmut Holzgraefe noted that the example Nuno had given would be called
“deterministic” rather than “const”. Nuno agreed, but added that his proposed
changes shouldn’t have any additional side effects.

PHP user Peter Brodersen liked the idea, but wondered if the ability to
change the encoding charset at run-time wouldn’t break the example? He
assumed that it would be important to be aware of settings that could
potentially alter the results of individual functions, and use that awareness
to decide whether a function really is deterministic at this level.

Short version: It’s probably trickier than it sounds.

CVS: Namespace/autoload problem resolved

Changes in CVS that you should probably be aware of include:

  • In ext/mysql, bug 42549
    (ext/mysql failed to compile with libmysql 3.23) was
    fixed [Scott MacVicar]
  • Zend Engine bug #42590 (Make
    the engine recognize \v and \f escape sequences)
    was fixed [Ilia]
  • In the CGI SAPI, bug #42587
    (behaviour change regarding symlinked .php files) was fixed
    [Dmitry]
  • Session bug #42596
    (session.save_path MODE option does not work) was
    fixed [Ilia]
  • Alpha support for imagefilter’s IMG_FILTER_COLORIZE was
    added to both libgd and the gd extension [Pierre]
  • Core bug #39651
    (proc_open() append mode doesn’t work on Windows)
    was fixed [Nuno]
  • In ext/bz2, bug #42627
    (bz2 extension fails to build with -fno-common) was fixed
    [Jani]
  • PDO bug #42643 (CLI segfaults
    if using ATTR_PERSISTENT) was fixed [Ilia]

In other CVS news, Dmitry committed his namespace/__autoload()
solution into CVS HEAD. In cases of ambiguity, __autoload() will
now be called only after checking for the classname both in the current
namespace and in the global (internal class) namespace. Marcus, who had just
been through a very long discussion with François, Paweł
Stradomski and Stut about the differences between classes and interfaces in
PHP and the reasons why autoloaders would need to be something
completely new to cope with constants and functions, greeted Dmitry’s commit
with a heartfelt ‘nice work!

Back in the 5_2 branch, Andrey Hristov fixed an apparently unreported
ext/mysqli bug he thought had been fixed long ago, which caused
Windows builds to throw a warning and leak memory on thread exit.

Andi extended Stas’ karma and unleashed him onto the CVS account waiting
list. Stas promptly exercised his new rights by giving the entire ICU group
access to the unicode module in PECL.

Short version: David Wang still didn’t get his CVS account.

PAT: LSB, GC discussions

Etienne Kneuss finally responded to Dmitry’s
request
for a comparison between his most recent late static binding
patch and the one the Zend team had come up with. Etienne explained that he
was on vacation and didn’t have internet access on a regular basis. In
response to Dmitry’s question about callback support – he’d deliberately left
support for callbacks out of his patch because he’d planned to do some
clean-up work there.

Etienne had managed to look through the patch Dmitry sent, and noted that the
Zend implementation seemed quite different to – and much larger than – his
own. However, he had been unable to patch it onto current CVS HEAD. Could
Dmitry please post an updated version? Dmitry promptly obliged.

Marcus, meanwhile, was back onto the garbage collection theme. He felt that
David Wang’s reference macro patch should go directly into PHP 5.3, since
binary compatibility would be broken there anyway. His only query was whether
it would make sense to use the __ prefix internally, in the same
way that it’s already used in PHP. Marcus also put in a plea against
magic switches that lead to broken code.’ Wasn’t it more normal
policy for Zend to try to break new things to find where the problems lay,
rather than to try and avoid breakage at the outset?

Cristian Rodriguez got the wrong end of the stick and assumed Marcus was
talking about an INI switch for garbage collection, which led him to grumble
yet again about ‘the unicode.semantics switch thingy‘. David Wang put
Cristian straight; the refcount manipulation macros are just
that, and have nothing to do with garbage collection. In fact, he added,
calling the switch ZEND_GC was misleading. He neglected to
mention that it’s also a compile-time switch.

Short version: A week of confusions.