# Zend Weekly Summaries Issue #355

Comments Off on Zend Weekly Summaries Issue #355

19th August – 25th August 2007

### TLK: unicode.semantics (for the last time?)

David Coallier doesn’t like it when long, long threads, such as the “What is
the use of unicode.semantics in PHP 6?” one, end silently. He demanded to
know whether that flag is actually to be removed, or not? ‘What is
happening here in the background that we are not seeing?

Andrei Zmievski explained that the team members are at an impasse over it
(or, as Tony Dovgal wrote, ‘It’s stuck‘.) Andrei added that he
personally would be fine with removing ‘the damn switch‘ and having a
Unicode-only PHP 6, if only to avoid having that discussion again. A jubilant
Stas Malyshev restrained him, pointing out that there were still plenty of
soldiers in both camps. Jani retorted that if Andrei, as leader of the
Unicode effort, was okay with getting rid of unicode.semantics,
then to him it sounded very much like its death knell. Stas argued
that this wasn’t actually consensus. Pierre-Alain Joye argued that actually
it was. Andrei simply asked Stas if he had any other concrete proposals for
breaking the deadlock, and Derick Rethans muttered that everyone but Stas
seemed to have reached consensus already.

Andi Gutmans weighed in, pointed out that Zend have implemented and maintain
most of the Unicode support, and made it very clear that he saw no reason to
remove the unicode.semantics switch. Unfortunately, he mentioned
the main reason most of the opposition want the switch gone – the performance
hit, which will impact all PHP 6 users, not only those using Unicode.
Andi then asked that the discussion continue elsewhere, among those directly
involved in the project. Luckily, the caffeine reached his system before
anyone but Derick had time to protest, and he posted again. This time, he
asked for volunteers to port their applications to PHP 6 so that the
performance impact could be properly assessed, rather than guessed at. He
also hoped for help in figuring out the feasibility of a migration script to
update PHP 5 code for PHP 6. Lukas Smith pointed out that nobody on the
internals list was likely to have time to investigate either thing, and
suggested a public coding contest. Andi, though, considered this part of the
implementation effort. To him, the worst thing would be to make an arbitrary
and uninformed decision at this point.

Derick wrote rather scathingly that he didn’t see people queuing up to port
100,000 lines of code just to see if it would work, and there wouldn’t be
much point in porting anything less. Andi retorted that his team have already
started porting the Zend Framework, which contains rather more LOC than
100,000, and have found a number of issues through the attempt. Part of the
benefit of the exercise lay in discovering what could or could not be
automated; ‘preferably we’ll have some docs and scripts available for our
users with PHP 6 and not just a bunch of bits with a “good luck”
message.
‘ Couldn’t Derick’s team try it with ezComponents?

PHP user Lester Caine wondered how much work was actually involved in porting
applications to run under PHP 6 these days, and mentioned that he’d be
prepared to spend time on ‘at least starting to have a look‘. He
followed this up with a
fairly long diagnosis
that made it plain he wasn’t fully armed with the facts when it came to
Unicode. Johannes Schlüter kindly took the time to explain exactly what the issues were,
in his own experience, when porting PHP 5 applications to PHP 6. He concluded
that it’s perfectly possible to make applications that will run under both PHP 5
and PHP 6 with Unicode support switched on, with the caveat that ‘“run” there
means “it works but won’t benefit from the unicode stuff”
‘.

Lukas reminded Andi that the “remove the switch” camp were committed to
back-porting most of the functionality in PHP 6, and added that there had
also been an off-list recommendation to make PHP 5.3 as forward compatible as
possible. He also reiterated the chief argument, that maintaining the switch
in CVS HEAD is much the same as maintaining two branches of PHP, and in some
cases even harder. Finally, although Lukas was well aware of the benefits of
having people test-port real-world applications, it didn’t make sense to him
to focus on large applications at a stage when the performance impact and
migration time are not known, but the increase in maintenance is.

Andi argued that retaining the switch would make it easier for users to
upgrade; it would also give them the choice of whether to go Unicode or not.
He had concerns about how long PHP 5 would need to be maintained, and didn’t
see how maintaining a single branch with both Unicode and non-Unicode options
could mean more work than maintaining two separate branches. The biggest
problems with supporting the switch are in the Zend Engine, and the Zend team
have taken responsibility for most of the maintenance there. Finally, their
users, which was why he wanted to go back to the idea of making an automated
update script. Still – Andi was prepared to listen…

Lukas felt that BC was not the main issue with the Unicode switch; the
feedback he’d heard on the subject said that making code work on all versions
was possible, albeit with some work. To him, therefore, the argument could
only revolve around performance. Perhaps it was possible to figure out some
ballpark figure for the performance loss without actually porting any major
applications?

Google SoC student Nicolas Bérard-Nault intervened. He’d ported an
application – the Jaws CMS – to PHP 6 as part of his GSoC project, and had
found the unicode.semantics switch very frustrating. It meant
taking into consideration not only the two scenarios (off or on), but also
the possibility that the switch might be turned off or on at any point in the
future, regardless of the setting at the point of installation. Nicholas gave
the specific example of coping with a serialized configuration file saved
with unicode.semantics on.

Effectively, wrote Nicholas, having the switch means that all strings
throughout a PHP 6 application need to be explicitly cast, because there’s no
way to know what you’ll be given when you ask for a string. This had huge
ramifications; it would mean widespread education for PHP users, and it also
meant that PHP strings are no longer loosely typed. All in all, the switch
could only be of benefit to those able to control their environment. In
Nicholas’ view, “the damn switch” was a bad idea: ‘It removes a good part
of what made PHP a success: simplicity. Get rid of it once and for all.

Rasmus Lerdorf pointed out that the problem of encoding clashes isn’t
actually limited to the Unicode switch; you could have the exact same issue
with Big-5 and Shift-JIS. Andi, though, had listened. He felt it was a
mistake, and that it risked a split in the user base, but if the majority
wanted to remove the switch… In his view, the best way forward for PHP 6
adoption would be to help the most popular PHP applications make the leap to
Unicode support. More immediately, the team should start identifying which
PHP 6 features should be back ported to PHP 5.3.

Short version: The nays have it.

### TLK: Multiple namespaces – and brackets

Greg Beaver was back on the namespace brackets thing. He offered up a patch adding the
syntax namespace {*stuff } and allowing multiple namespaces per
file. Greg claimed that this would bring no performance penalty or added
complexity, and the patch itself was extremely simple. It supported a single
multiple namespace use case, aimed squarely at application developers wanting
to use a namespacing solution for distribution. The recommended format for
development should still be namespace OneRingToRuleThemAll;
build tools could then easily take a bunch of files and store all their
namespace declarations in one place. Tests were included.

David Coallier asked eagerly for someone to commit the patch, quickly, ‘so
no-one can say anything
‘. Jani wrote something very similar, but Rasmus
intervened. He pointed out that, just because Greg believed there was no
performance penalty, didn’t make it so. In fact, Rasmus couldn’t see how the
symbols could be resolved at compile time if the patch were included; that
meant symbol resolution had been moved to the executor, which would imply
a huge performance penalty‘.

Greg was certain this wasn’t the case. The only difference was that some PHP
code might need to be read prior to the T_NAMESPACE token,
because introducing zend_do_end_namespace() meant the namespace
could be modified or removed mid-parse. Rasmus argued that breaking the
single namespace per file rule would making opcode caching a nightmare; where
currently there doesn’t need to be a secondary namespace lookup in a single
op_array, there would now need to be multiple checks. And having
nested namespaces would really be a mess. Given:

  

bar.inc‘s functions might be A::B::func() or B::func()
depending on how the file is included, again preventing compile-time
optimization. Greg pointed out that his patch explicitly disallows nesting;
that code would give two separate namespaces, global and B.
Besides, the namespace processing itself would still be per file. When
it came to opcode caches, there would be more complexity – but Greg felt that
by implementing the feature essentially as macro expansion that snippet would
come down to:

  

Was he missing something?

Rasmus explained that opcode caches – including APC – will replace class and
function definitions with no-ops in the cached op_arrays,
leaving the resolution to compile time. Things like autoload and conditional
includes wreck opcode caching optimization, because they require the
runtime context. Class inheritance that can span includes, and code that
his team. It was possible that Greg’s patch didn’t make things worse for APC,
but Rasmus was concerned about the ability to change the namespace mid-file.
It might work at compile-time – so long as no includes or imports were
used to break the namespace/file association. In particular, it was important
to avoid anything that might allow a single file to change its namespace
depending on how it was included.

Larry Garfield double-checked; did Greg’s patch mean that

  

would be equivalent to

  

and if so, what would happen if a developer mixed them? Would the compiler
have a heart attack if bracketed namespaces were nested? Stas retorted that
the real fun came with code like:

  

where you’d need to count brackets to know which namespace bar
was in and how to call it. As for nesting – that was a can of worms he
wouldn’t want to open… Greg intervened to point out that either mixing or
nesting would in fact throw a fatal compiler error, which quietened Stas’ nerves.

Dmitry Stogov wanted to know how import statements would behave
with Greg’s patch, since an import in one namespace would affect anything
that followed. Greg initially saw this as a documentation issue, but thought
it would be necessary to expand the imports if two files were combined that
included them. Stas pointed out that import has no relationship
with files; it simply works on names. Throwing a fatal error when a class
name conflicts with an import name would prevent combining namespaces from
different areas – or at least, would mean having to know the use context in
advance. This was precisely the problem namespaces were supposed to solve!
Making import have an effect on other namespaces would mean that
code in one namespace could break code in another once the files were combined:

  

Greg clarified: by “expansion” he meant that all references to
bar would need to be translated to otherfoo::bar.
If this was problematic, perhaps there could be a separate import scope
within namespace brackets? It would necessarily mean that a top-level import
wasn’t the same thing as an import within a namespace… he wasn’t sure he
liked that idea, but attached a new
patch
implementing it anyway. Stas didn’t like the idea either, but
agreed it would be the only way to make it work with any kind of consistency.

Rasmus, meanwhile, responded to a Golden
Email Award candidate
from Guilherme Blanco asking for a more detailed
explanation of the opcode cache limitations. He explained that although
something like:

  

might logically suggest that any functions or classes defined in
bar.inc would be in the foo namespace, this was in fact
something to avoid because it would allow:

  

This would make things very complicated for an opcode cache, which wants to
resolve everything at compile time, because the contents of bar.inc
would have different names depending on the context of the include. Allowing
this would slow performance, even for code that didn’t use namespaces.

Marcus Börger agreed, and added that those limitations should stay. They
didn’t only make it easier for opcode caches; they also adhered to the KISS
approach of PHP.

Not a man to be downhearted, Greg offered up yet
another patch
offering local import scope. This one actually removed the
original namespace syntax and replaced it with brackets, although still not
allowing nested namespaces. To ensure that import remained
global by default, Greg had used the rather weird and wonderful syntax:

  namespace MyNamespace unset import {} 

This way, he wrote, the odd import conflict could be handled on a
case-by-case basis; most users would never need the unset part,
since they would more usually be using import to set up a
namespace.

Greg’s explanation
was actually a fair bit longer
than this, and M. Sokolewicz wrote that he’d had to re-read his post just to
figure out what was going on. He found unset importodd, and
not very transparent
‘. Derick agreed; ‘the concept that was just posted
here has IMO a big WTF factor.
‘ Stas contented himself with terming
unset import a ‘very artificial concept‘. It would be
irritating for the user to have to manually control all the imports; and what
would happen if the user wanted to keep one import and not another? Besides,
this didn’t really resolve the problem of namespaced code potentially being
affected by other code.

Greg went back to basics with his final
patch
. His intention by now was simply to make it possible to combine
multiple namespaces into a single file; he’d given up on the brackets,
writing that they appeared to introduce more problems than they were worth.
(That part made Stas laugh out loud.) This patch would allow you to take two
files and create something like:

  

The only problem with it that Greg could see was that:

  

would wrongly combine into the one namespace. The workaround for
this was to insert a bogus namespace declaration for the contents of
file1.php, which was simple enough for toolmakers to manage in
userspace code.

Stas thought this solution livable, since it reduced the problems with
import. He wrote that the Zend team would need to check
thoroughly for weird use cases, but if they didn’t find any – Greg’s final
patch might just work.

Short version: Was it just me, or did Greg really get his way?

### TLK: Constants in namespaces

Yet more namespace stuff, this time from Dmitry. He offered a patch for review
to implement namespace support for constants. Constants could be declared
in namespaces, in the same way that classes can, and these namespaced constants
could be used in the same way as namespaced functions:

  

He hoped to commit his patch later in the week, if there were no objections.

Johannes liked the idea, particularly since he believed – on looking through
the patch – that const would also work outside namespaces,
allowing compile-time constants anywhere. Dmitry agreed that they would work
outside namespaces, but explained that they wouldn’t be compile-time; they’d
be set during execution, similarly to define(). Pierre mentioned
pecl/hidef at this point; he thought it would be nicer to get them
resolved at compile time, ‘at least when no instructions are involved in
the definition
‘. Dmitry argued that the value of a constant might be
unknown at compile time, but Pierre pointed out that it quite often isn’t;
obviously expressions, functions or double-quoted strings would fit into that
category.

Tony just thought it would be a bad idea to use the same const
syntax for both runtime and compile-time behaviours. Dmitry and Stas both
acknowledged his point. Stas thought that “constant constants” would be much
less useful; many uses of constants imply runtime evaluation, and although
const in classes doesn’t allow runtime expressions, there is at
least the option to store expressions in per-class variables. There is no
such option in namespaces. Perhaps a new keyword was needed… or a
modification for define()… or a new function, such as
ns_define(), that would magically add the namespace prefix to
the constant?

Tony, pointing out that this basically meant Stas saw class constants as
useless in their current state, explained that he didn’t feel it was
required that they become more useful – it would just be nice to have.
Consistent behaviour would be preferable to inconsistency with extra features.
Stas replied more cautiously; his point had been that class variables and the
global define() offer ways around the limitations of
const, whereas there are no workarounds for compile-time
namespace constants. Making const useful there would be better.
He didn’t really see a problem with re-using the keyword either; ‘I don’t
say it’s necessarily a good thing, but it’s not without precedent
‘, he
wrote, citing static.

Tony retorted that ‘useful things should not bring confusion‘; there
are already runtime constants available through define(), and he
didn’t see a critical need for runtime namespace constants. Stas pointed out
that

  define(__NAMESPACE__.'::foo', 'bar'); 

frankly sucks, whereas

  const foo = 'bar'; 

looks much nicer. However, Tony was welcome to propose less confusing syntax.
Tony promptly proposed that either the namespace constants should be defined
in compile-time, like class constants, or class constants should be defined
in runtime too. He didn’t care particular which way it went. Stas pointed out
that the latter isn’t really an option, and Dmitry agreed. He felt the patch
needed to be trimmed to use only compile-time constants in namespaces, and
all runtime constants would need to use the existing define().
He also thought Stas’ idea of extending define() to allow
namespaced constant creation was a possibility:

  function define(string $var, mixed$val, bool namespace_constant = false); 

Tony and Johannes both agreed in principle, but Johannes pointed out that
define() already has a flag, and this would be its third parameter:

  function define(string $var, mixed$val, bool case_insensitive = false, bool namespace_constant = false); 

He’d rather have

  define(__NAMESPACE__.'::FOOBAR', $value);  than   define('FOOBAR',$value, false, true); 

Dmitry agreed that the latter ‘looks terrible‘, and left
define() alone. Stas pointed out that it would’ve been difficult
to implement anyway, since the namespace name isn’t known at runtime.
define() would need to be an operator for that to happen, or
else the actual namespace name should be passed as an argument. How about
adding an operator that would prepend __NAMESPACE__.'::'. to the
constant? Something like:

  define(ns_fullname('FOOBAR'), $value);  would look better, in his opinion, although he didn’t even like the name ns_fullname() himself. Tony was amused: ‘Special FUNCTION to do “__NAMESPACE__.’::’?! Sheesh… ‘ but Stas swiftly corrected him: he’d meant operator, of course. Greg offered up:   define_ns('FOOBAR',$value); 

but Stas had been there already since suggesting it himself. He’d dismissed
it, partly because it was a combined function and operator, and partly
because operators with underscores simply don’t look good.

Short version: const will remain as it is. Now available in a namespace near you!

### NEW: PHP 5.2.4 RC3

Ilia Alshanetsky, as Release Master for the PHP 5.2 series, announced the
availability of PHP 5.2.4 RC3 for testing:

Dale Walsh complained that the Release Candidate failed to generate a working
binary for Mac OSX 10.4.10 on an Intel machine. He didn’t think it was worth
his spending time on it though, because it was far more important to him to
discover why PHP 5.2.3 wouldn’t generate a thread-safe binary for Apache 1.3.
Ilia replied calmly that he had no problems whatsoever compiling on his Intel
Mac; perhaps Dale had enabled a problem module? The second issue was more
straightforward; the Apache 1 SAPI doesn’t need to be thread-safe because

Dale agreed with the module diagnosis, but hadn’t figured which was to blame
at that point. However the point of building a thread-safe Apache binary was
to generate a single set of PHP modules that could be used with both Apache 1
and Apache 2. Rasmus explained that building it threaded would give a
significant performance hit, but Dale didn’t see that as an issue. He wrote
rather a lot more about his project, none of which has anything to do with
internal PHP development whatever so I’ll leave it there. Rasmus was only
interested in Dale’s avowal that it wasn’t possible to build for Apache 2
without building threaded, given that the default Apache MPM in most distros
is actually the non-threaded prefork MPM.

The ever-faithful Edin Kadribasic intervened with his announcement that the

Dale sent rather a lot of emails about his project following this. Again
nothing to do with the development of PHP, and so not of interest to either
internals@ or these archives.

Short version: There’s reporting a bug… and then there’s noise.

### TLK: Integrating PHP with a homegrown server

And now for something completely different. PHP user Steve Francisco had
successfully written a simple Java-based (oxymoron) server that listens on
port 80 and serves files. He wanted to extend the project to support PHP, and
was looking for guidance. Steve’s initial attempt had involved calling PHP CLI
to capture the HTML and send it back to the caller; this worked for basic
files, but he had problems coping with query strings in URLs. Basically, he
hadn’t found a way to manage

  $echo$_GET["parm1"]; 

using CLI.

Robert Cummings suggested running a PHP script with a function capable of
reading a file written by Java and populating the PHP globals with the
content as appropriate.

Mikko Koppanen suggested using CGI rather than CLI.

Daniel Brown wrote about argv. Steve would need to transpose the
$_GET variables from the request to argv variables; the code would look roughly like:  $v) {     $data .= " ".$p."=".$v; } exec('which php '.$filename.$data,$ret); // This would work on Linux.... // exec('X:\path\to\php.exe '.$filename.$data,$ret); ?>  If a PHP script needed GET variables, reverse-transposing them would be a simple matter:    give or take the odd line of input sanitation. Steve thanked Daniel; he’d now seen a way. He could set an auto_prepend_file containing something like the snippet Daniel had given, and all the Java side would need to do would be to set up the request parameters as argv. That, it seems, was pretty much what Daniel had intended in the first place. Short version: Clever stuff. ### CVS: zend_alter_ini_entry strife Changes in CVS prior to PHP 5.2.4 RC3 that you should probably be aware of include: • In the mbstring extension, mbfl_strrpos() gained negative offset support, closing bug #42085 [Rui Hirokawa] • The configuration switch --disable-rpath should work properly now [Jani] • ext/curl now has support for the CURLPROXY_SOCKS4 option [Sara Golemon] • ext/pdo_sqlite is no longer marked EXPERIMENTAL [Ilia] • In ext/spl, bug #42364 (Crash when using getRealPath() with DirectoryIterator) was fixed [Johannes] • In ext/pgsql, bug #42368 (Incorrect error message displayed by pg_escape_string()) was fixed [Ilia] • Zend Engine bug #42009 (is_a() and is_subclass_of() should NOT call autoload, in the same way as “instanceof” operator) was fixed [Dmitry] • In ext/soap, bug #42183 (classmap causes crash in non-wsdl mode) was fixed [Dmitry] • Core bug #42365 (glob() crashes with invalid flags) was fixed [Jani] • Sessions bug #37273 (Symlinks and mod_files session handler allow open_basedir bypass) was fixed in 5_2 [Ilia] and ported to CVS HEAD [Jani] • version_compare() now understands “RC” even when it’s in lower case [Derick] In other CVS news, Tony reverted part of a fix Ilia had made in the Zend Engine back in June for a memory_limit interruption vulnerability in zend_alter_ini_entry(). Tony explained that   if (stage == ZEND_INI_STAGE_ACTIVATE && modify_type == ZEND_INI_SYSTEM) { ini_entry->modifiable = ZEND_INI_SYSTEM; }  breaks multithreaded servers. Stas double-checked; didn’t this mean php_admin_value could be overridden with php_value? Tony concurred, but added that the only harmless solution he could see would mean breaking binary compatibility – not an option in a micro release. Stas suggested fixing it at SAPI level, but admitted that he wasn’t sure how to work with ini_set() from there. Jani felt there’d been a bit of a misdiagnosis; he was able to reproduce the problem with FastCGI, which isn’t multithreaded, and blamed the whole thing on an EG(error_reporting) setting elsewhere which meant the @ silencer was required for a zend_alter_ini_entry() call in zend_vm_def.h. That said, he added that Apache already avoids the php_admin_value override potential at SAPI level with a function named merge_php_config(). Dmitry ignored the SAPI/INI debate and got on with his namespace improvements in CVS HEAD, bringing us optimizations, namespace constants and a fix for name resolution which he explained as follows: Short version: Namespaces are probably worth testing now. ### PAT: LSB, debug_backtrace, setcookie2 Sebastian Bergmann was first up in the patch department this week, offering something in the way of a minor fix. Sebastian explained that he’d added a new field to the debug_backtrace() result array back in PHP 5.1.1. This field contains a reference to the calling object of a frame. It had become apparent that unconditionally including $frame['object'] in the result array can be problematic in some
situations, and Sebastian proposed adding an optional Boolean parameter to
debug_backtrace(), named provide_object, to toggle
its inclusion. The new parameter defaults to a value of 1 to
preserve back compatibility.

François Laupretre came next, with a fix for bug #42396 in PHP 5. He wrote that the
configuration option --enable-zend-multibyte leads to
auto-detection of Unicode encoded scripts. This is fine until a script
contains null bytes following a call to __HALT_COMPILER(), when
execution results in ‘a lot of ‘?’ garbage‘. Effectively, this renders
anything using __HALT_COMPILER() (read: PHK or phar) incompatible
with --enable-zend-multibyte, with the only workaround being the
unacceptable one of turning off the detect_unicode flag.
François’ patch offered ‘a small detection loop‘ to check for a
sequence of four 0xff bytes; if found, Unicode detection is
switched off and the script considered non-Unicode. His idea was that
deliberately setting the switch would make generated archives compatible with
the configuration option.

Etienne Kneuss came up with his sixth attempt
to bring late static binding
to PHP – this time without significant
slowdown or memory usage increase, at least as measured by
Zend/bench.php. Unusually, he also offered some
documentation
alongside the patch. Dmitry commented that Zend already have
a similar patch that
also offers support for constants and runtime function calls. Had they missed

Finally, Ben Ramsey offered an implementation of a new function,
setcookie2(), ‘to support the Set-Cookie2 response header
defined in RFC 2965
‘ which renders the original Netscape cookie
specification and RFC 2109 obsolete. Although noting that the only browser
currently implementing Cookie2 on the client side is Opera, Ben
listed the improvements over the original header: better user control over
cookie usage, better control over cookie deletion, and the ability to specify
a list of ports for which the cookie is valid. There were a couple of
statements Ben’s patch didn’t cover: cookie attributes signified by a name
beginning with \$, and prioritizing Cookie2 if both
versions are present.

Stas was interested enough to take a look at Ben’s implementation, but
baulked at the 13 (!) arguments that came with the function. He recommended
passing an array instead. Ben, who had been given the same recommendation
off-list by Sara, replied that he was working on it.