Zend Weekly Summaries Issue #276

      Comments Off on Zend Weekly Summaries Issue #276

TLK: C++ extensions
TLK: pthread queries
TLK: Casts and unicode
TLK: C iterators, PHP classes
REQ: Asymmetry
BUG: ZIP in 5.1.2
TLK: True labelled break
FIX: safe_mode is gone
REQ: Deprecation marker
CVS: sys_getloadavg
PAT: stream_close

TLK: C++ extensions

Now completely versed in the mores of open source development, Andrew Mather
posted a short email to internals@ asking for further help. By now his first-ever
shared PHP extension was building and running nicely, and he’d ported a C++ data
server to Linux. The problem now was getting the two to talk. Is there a way for PHP
to communicate with a C++ library? Is there some kind of COM/.Net equivalent under
*nix that permits language-independent calls?

Sebastian Bergmann offered Andrew ext/dangerous,
but Jeremy Johnstone suggested writing the PHP extension itself in C++ and gave a rough overview of the
minimal‘ changes needed in the extension in order to do this. I recalled
linking to Jay Smith’s tutorial about C++ extensions some months ago, and sent
Andrew to the weeklies index in search of it. George Schlossnagle had a working version of the same link [dead link].
Hartmut Holzgraefe took the opportunity to mail out ‘yet
another shameless plug
‘ for his baby, Codegen_PECL, saying
that it supports
C++ extension writing

Andrew thanked us all individually off-list, which was nice of him. He also took
the trouble to write to internals@ to provide the working link and praise Jay
Smith’s tutorial, which provides full code for a C++ based extension. He noted that
the example given there doesn’t work with the PHP distribution that comes with SUSE
9.3, but reported that it works fine if you build PHP 5 directly from source.

Short version: Jay’s tutorial link has been updated in weeklies passim
as a result of this exchange.

TLK: pthread queries

Someone known only as Tomas wrote to internals@ hoping to discover why the
generated configure file explicitly ignores the pthread* libraries. Is it safe to
create extensions with pthreads, or are there some hard-to-debug problems with that?
He added that he has created a PHP extension with a lot of worker pthreads that
appears to work perfectly, but wondered whether there is some potential instability

One Yoav Artzi wrote in a similar vein; his configure line includes
--enable-maintainer-zts and --with-tsrm-pthreads, but he’d
found that PHP’s configuration process automatically disables pthreads on detecting
cross compilation. The problem here was that Yoav needed a ZTS build because an
application he wanted to use requires it, but specifying ZTS appears to be the way
cross platform compilation is detected.

Short version: Come back Jani, all is forgiven!

TLK: Casts and unicode

MediaWiki developer Brion Vibber had been looking into the Unicode support
implementation, armed only with README.UNICODE (the design document contained
in the CVS HEAD branch of php-src) and the Paris PDM
meeting notes
. He’d found that the former still refers to
three types of strings, IS_UNICODE, IS_STRING and
IS_BINARY; unicode and string types could be
implicitly concatenated and explicitly cast to one another, whereas the
binary type was a ‘black hole‘ requiring a conversion function
beforehand. According to the notes, on the other hand, there are now only
unicode and binary types.

In an attempt to find out the state of the art, Brion had tested some strings to
see how they reacted, and wondered whether his results were intended behaviour or
the side effects of that change, terming them ‘worryingly inconsistent‘.
Whether the unicode_semantics switch happened to be on or off, he’d
found that binary and Unicode strings can’t be concatenated, and the
(unicode) cast fails on binary strings, string variables and string
literals. With unicode_semantics turned off, (string) will
convert Unicode strings to binary; with the switch on, (string) casts
to Unicode, but there is no way to cast from Unicode string variables to binary
strings – although there is a way to cast to binary from Unicode string

Brion argued that ‘there is (currently) no way to write Unicode literals that
will reliably work
‘. He argued that, even if the literal strings used in the
code are all ASCII, they can’t be mixed and matched with runtime
unicode strings because attempting to do so will throw a fatal error
when unicode_semantics is off. If the setting couldn’t be changed on a
per-request basis it would make things even worse, because many hosting providers
are likely to turn it off. Brion’s preferred solution would be to have an encoding
pragma setting that overrides unicode_semantics, and he asked whether
there was any interest in supporting such a mode?

Andrei Zmievski thanked Brion for his feedback. He agreed that
README.UNICODE is ‘a bit out of date‘, and promised to update the file
when the conversion and casting decisions have been finalized. Andrei took the point
about portability, and had a few changes to propose:

Unicode support: casting and conversion
unicode_semantics Cast Converts Basis
Off (unicode)
binary string to Unicode string runtime_encoding
On (unicode)
binary string to Unicode string TBD (runtime_ or
Off (string)
Unicode string to binary string runtime_encoding
On (string)
Unicode string to binary string runtime_encoding
Note: Regardless of the setting, binary
and Unicode strings cannot be concatenated. You have to cast all operands to the
same type.

Andrei felt that taking this approach would allow scripts to rely on the
behaviour of the cast operators, given that making the
unicode_semantics switch per-request is not an option. He called
Brion’s idea – treating all string literals as Unicode where an encoding pragma is
used – ‘interesting‘, and hoped to explore it further.

Xuefer wrote in to explain that the lack of cross-type concatenation was a real
problem, and threw in

] .

to demonstrate why; $_SERVER array elements are imported from httpd.
An option in that situation might be to do something like


for a single line, or for the whole script, as per Brion’s idea of a pragma. He
noted that code using either the b cast or
pragma/declare() solution couldn’t be parsed by earlier PHP versions…
and also that there would be a need for declare() to support binary
strings too. In any event, having declare() at the top of most scripts
would render unicode_semantics pointless. He believed that the only
reason to disallow $binary.$unicode might be performance, and it would
be better to use E_STRICT or a profiler pass on information about the
implicit cast in that case.

Short version: One headache precedes another.

TLK: C iterators, PHP classes

The Clayton formerly known as l0t3k had been working on Unicode-based collation
at extension level, and had some questions about using iterators in C code. He
wanted to know whether the interface he’d picked up from
zend_class_entry->get_iterator would also handle userland classes
implementing the Iterator interface?

Confused by the question, Marcus Börger looked at the code Clayton had provided
and pointed out that he was using a C-level iterator manually rather than within a
foreach() construct; this meant he was accessing the iterator
index property – originally intended to be private to a couple of
opcodes. However, the more Marcus thought about it, the less ‘wrong’ that approach
seemed. He went on to write that, if an object – any object – implements
Traversable or a Traversable-derived interface, its
zend_class_entry would have a get_iterator function giving
access to an iterator struct and accompanying handler table; that struct’s data
members should never be touched directly.

Clayton explained that he was trying to allow sorting by iterator as well as
standard arrays, and gave a userland example of what he hoped to achieve:


$coll = new Collator("en_US"); // assume en_US as
default locale
= new TextIterator("The rain in Spain drops
gently on the Plains"
, TextIterator::WORD);
var_dump($coll->sort($iter)); // returns the sorted
list of words
$expr = new Regex($patttern,
var_dump($coll->sort($expr, SORT_DESC));
// Sorts matches in descending order


This was the reason for using the iterator manually in his code, which (it
transpired) aims to become a part of ext/unicode.

Later, he wondered about that injunction not to touch the iterator data members;
did Marcus mean he should make a copy of whatever
iter->funcs->get_current_data() comes up with? Clayton felt there
was some inconsistency when it came to dealing with the current element in
iterators; Unicode’s TextIterator reuses the same zval
value at every pass, whereas SPL classes call zval_ptr_dtor() between
iterations. Marcus explained that the iterator classes can be overloaded in SPL,
whereas TextIterator can not. He went on to write that when
foreach() is used, data copying is automatic, but Clayton’s manual
approach to iteration meant that he would need to perform the copy himself;
there is no inconsistency, just different usage‘.

Short version: Life on the bleeding edge, eh?

REQ: Asymmetry

Sara Golemon posted a request to make a minor change to the parser. She explained
that, currently, when the parser comes across

> expr2

it silently switches it around to

< expr1

and reuses the ZEND_IS_SMALLER opcode.

This practice was giving her some headaches with her latest baby, pecl/operator, which
implements operator overloading for objects. In the extension, overloaded binary
operators should be left associative; given the expression

expr1 op

expr1 ‘decides’ how it will combine with or compare to expr2. Since
the greater-than operator is being quietly flipped in the parser, the extension has
no way to know whether > or < was intended, and has
to assume

< op2

even where the intention was

> op1

To resolve the problem, Sara had bundled a non-intrusive patch with pecl/operator to set the
zend_op property extended_value at either 1 or 0,
depending on whether the operator was genuinely greater-than or not. Would there be
any chance of getting that patch merged into the Zend Engine?

Zeev Suraski looked into it, and at first was bemused by Sara’s claim that
expr1 would still need to ‘decide’ its relationship with
expr2 despite operators being left associative. He agreed that that
patch appears ‘harmless enough‘, but asked her to elaborate on the problem it
was intended to fix. Sara carefully reiterated the issue; the parser ‘sees’ $a
> $b
as $b < $a, which – although an unimportant
distinction for all normal PHP comparisons – sends her overloaded object operators
directly into reverse comparisons.

Following a disclaimer (‘I am not a big fan of operator overloading…‘),
Andi Gutmans suggested that it might be better to require a consistent design than
to make the operator left associative. The overloaded object should ‘know’ how to
deal with objects that it can or can’t be compared to, and the extension should
‘know’ when an object is not overloaded. If there needed to be any change in
the parser it would be better to have an IS_GREATER opcode than ‘an
extended_value hack
‘ – but given that operator overloading isn’t
and probably shouldn’t be a part of standard PHP, making any such changes didn’t
make immediate sense. That said, it might be possible to split
IS_SMALLER into two opcodes for PHP 6…

Sara actually agreed with Andi’s point that operator overloading can lead to
unreadable code, and wrote that she had no intention of ever trying to make her
extension a part of PHP; it is simply a tool some people find useful. She had,
however given the design of her overloading approach some thought. The idea had been
to allow asymmetry between the operands, ‘as sometimes commutative properties
don’t apply cleanly
‘. Examples of this issue obviously include ‘sub/div ops’,
but also appear in things like matrix maths, where ‘a[1,3] *
is certainly different than b[3,1] *
. She agreed to look into applying Andi’s thesis to the
IS_SMALLER operator at least, being fairly certain that 2 is always
greater than 1, but was much happier when she reached the part about splitting the
opcode into two. Andi felt that there needed to be more compelling reasons to make
such a change in PHP 6; in the meantime, Sara’s patch should probably stay where it

Meanwhile Zeev was wondering whether Sara wasn’t ‘slightly worried about the
‘ of x > y having a different meaning to y
< x
? He wrote that those languages with support for operator overloading
also support strict typing, which would prevent that situation from arising. He
agreed entirely with the theory that 2 is probably always greater than 1, and added
that he tends to see relation operators as being in a different category to
‘operation operators’. Given that, he didn’t think there was a need to allow for the
possibility of x > y and y < x having different
meanings. Finally, Zeev suggested that Sara should reconsider her decision to allow
comparison between two different object types, pointing out that it opens up the
possibility of inconsistent results. He believed that ‘the rest of the operators
should be compatible with non-commutative implementations

Stefan Walk took issue with Zeev’s assertion over strictly typed languages. He
argued that Ruby supports operator overloading and has no problems with different
meanings for x > y and y < x, although he didn’t
know of a core class that uses it. Beyond that, wrote Stefan, ‘PHP already breaks
the transitivity rule for the equality operator… $a == $b and
$b == $c does not imply $a == $c
‘. How was it worse to
have an extension that could break the symmetry of PHP’s comparison operators? Zeev
in turn disagreed, saying that although there may be some unique cases where
transitivity is not maintained in PHP, those cases are unique, and arguably
make sense. Moreover, the discussion in hand was more fundamental than transitivity
issues; ‘it’s about the very meaning of smaller-than/greater-than, and the
relationship between them

PHP Documentation Group member Jakub Vrana had found something interesting in the

= array(0, 1);
= array(1 => 0, 0 => 1);
var_dump($a < $b); //
var_dump($a > $b); //

Zeev agreed this was interesting, and wrote that it was exactly
the kind of thing he’d been worried about introducing. In this case it seemed there
was a conceptual bug in zend_hash_compare(), which he felt should
probably be fixed.

Short version: One of those threads you need a good dictionary and no
sense of humour to get through.

BUG: ZIP in 5.1.2

PHP user Kip Krueger wrote to say that he’d been planning to install PHP 5.1.2 to
resolve a security issue, but had found that doing so broke the zip facilities
offered by pecl/zip. He had
already submitted a PECL bug report
against the extension, saying that it appears to load correctly but the zip
functions are not available. In the meantime he’d like to know (among other things)
whether the problem was in the PHP 5.1.2 release itself, or in the version of
php_zip.dll available through the pecl4win site?

Pierre-Alain Joye took responsibility for the issue, being the new and
as-yet-unlisted maintainer for the zip extension. He explained that the Windows
builds available on pecl4win are actually CVS snapshots, which situation he
described as a bug in itself. Pierre went on to explain that he had recently started
work on a completely new version of the extension, with the aim of implementing
much-needed write support and an OO interface for PHP 5.1 and up. As is standard, he
was working in CVS HEAD towards that alpha release – and CVS HEAD is used for the
Windows builds.

Pierre went on to offer Kip a release of the older, stable version that would
include an update to ‘fix’ the security issue mentioned, but it later transpired
that the security problem had nothing to do with the zip extension itself.

For anyone else with the same issue: you can download the PECL 5.1.1 bundle in
the PHP museum, and the
php_zip.dll available in that zipfile will run happily with PHP 5.1.2. This
isn’t always true of extensions from earlier release packages, but happens to be so
in this case.

Off-list, there was a loooooong debate about the role of the pecl4win site, and
the need to create known stable releases of PECL extensions for Windows systems, as
a direct result of Kip’s report. It’s not actually possible to automate stable
PECL/win32 builds at this moment in time, but will be at some point in the
not-too-distant future. We’re still discussing the best way to make them available,
as the existing snapshots are also very useful – both to the inner development
circle and the wider PHP community.

Short version: Watch this space.

TLK: True labelled break

I took the bull by the horns and re-started the discussion over labelled breaks, and no it
wasn’t because I was looking for ‘weeklies’ material. If anything the opposite was
true; the ‘compromise
Dmitry Stogov and Sara presented to the internals list in early
December was never given due consideration, thanks to the volume of list mail over
GOTO at the time. I hoped to avoid a repeat of the GOTO
discussions while giving their idea and patch a chance of a fair review.

Ilia Alshanetsky promptly threw out a -1 vote, apparently under the impression
that the patch had been discussed at length before. I asked him to look into it
again. Andi Gutmans recognized that the concept is in fact ‘orthogonal to the
GOTO discussion
‘, but felt that it didn’t add anything substantial
to PHP and could confuse users. He hoped that discussing the patch now would be a
way to ‘bed’ it once and for all. Kevin Waterson agreed it was ‘a third
‘ – apparently that’s an Australian expression meaning ‘superfluous’. I
argued that, for users that don’t find numbering intuitive, having the option of
labels is useful; I just saw this as a more user-friendly way of allowing nested
breaks. Andi pointed out that nested breaks aren’t much used in PHP in the first
place, but I felt this might be precisely because they aren’t straightforward
to use. Still, if nobody on the core team cared enough to argue the point further,
labelled breaks weren’t likely to be adopted anyway…

Off-list, Andi repeated that nested breaks are a very marginal feature, and asked
me to look into their usage; he wasn’t prepared to add labels if nobody was likely
to use the new syntax. I already knew they aren’t a major feature, and the research
I undertook gave unsurprising results; nested breaks tend to be used in parsers and
error handling, and nowhere else. Derick Rethans came into it to mention that
little-used features aren’t implicitly not useful, and stated that he would also
like to see Dmitry and Sara’s patch committed. Andi replied that he wasn’t against
the patch itself, or the syntax, but repeated that the level of user uptake was a
concern to him. Wez Furlong believed that ‘not many people use it because it’s
difficult to use
‘, and argued that having real labels could potentially change
that situation. He added that he is pro-GOTO, and would be for labelled
breaks only if GOTO is no longer an option. I reiterated that this
particular patch does not preclude GOTO ever being implemented;
a lot of people are seeing labelled breaks as a direct competitor to
GOTO, thanks to the long discussion over the original Paris PDM suggestion
of breaking to labels
that preceded this patch. The only thing it precludes is that ‘breaking to labels’ idea,
which nobody was happy about implementing in PHP anyway.

Derick wrote in promising to use labelled breaks if we get them, as did Hartmut.
Both said they had places where they would be used. Hartmut referred to break
as ‘a maintenance nightmare‘, and claimed that he currently works
around it by using flag variables when he needs to terminate deeply nested loops.
Labelled breaks are so much cleaner than either approach, he’d love to have them;
whereas break n; is ‘almost in the same league as goto

Short version: A rose is a rose is a cabbage. (Don’t get confused, now.)

FIX: safe_mode is gone

Following the Paris PDM recommendations, and (unusually) with the approval of the
entire spectrum of PHP developers and users, Andi went to work on removing
safe_mode from CVS HEAD this week. He got as far as the streams code
before he hit a problem, and wrote to Sara asking whether the
open_basedir check in php_plain_files_unlink() was
actually intended to be set up the way it was. The check was only being called when
safe_mode was on, as opposed to when open_basedir was
enforced. Was this a simple error in the source?

Sara agreed that it was in fact a bug, and admitted that she’d probably
introduced this ‘slight logic twist‘ when routing unlink() to
use stream wrappers. She looked into it and decided there was some good news; for a
start, it would be uncommon to have both safe_mode and
open_basedir either on or off simultaneously and secondly, the only
place in PHP source using that internal unlink() call is the userspace
unlink() function.

Andi thanked Sara for her analysis and asked her to keep an eye on his commits to
ensure he didn’t apply a wrong fix in the stream wrapper code. She did; he didn’t.
Andi went on to clean all trace of the safe_mode implementation from
the rest of the PHP core and most of the core extensions, ably assisted by Ilia.

Short version: Yay!

REQ: Deprecation marker

Marcus mailed the list to introduce ‘a tiny
that allows us to deprecate functions
‘ [dead link].
Whereas PHP currently issues an E_STRICT or E_NOTICE
(depending on the version) for items that are going to change in the future, this
isn’t a good solution when the change happens to involve replacing a function’s
name. The current approach is to rename the existing function and add an alias for
the old name, but the alias isn’t capable of issuing the deprecation message this
way; the function would need to be implemented twice over to allow it to do so.
Marcus’ patch allows a deprecation flag to be specified on the alias.

He went on to say that Andi had reviewed an earlier version of the patch, and had
disliked the introduction of if() statements specifically checking for
the ZEND_ACC_DEPRECATED flag in three separate places. He’d now hidden
the check behind the ‘abstract’ check in two of those cases, the third being in the
explicit call handler where it is a simple, effectively penalty-free integer
comparison. In fact there could be an ‘abstract’ check missing there too, and he’d
investigate that idea. Was anyone now against his applying those changes?

Sebastian and Sara both made their support known, with Sara mentioning in passing
that she had in fact suggested this marker two years ago. She liked the idea of
combining the checks to address the performance concerns, but wanted to see defines
for PHP_DEP_FE() and PHP_DEP_FALIAS() added alongside the
new flag.

Meanwhile Marcus looked a little deeper into zend_call_function()‘s
‘missing check’ for ZEND_ACC_ABSTRACT and discovered that error
generation is deliberately delayed there; there’s even a special opcode to allow
that behaviour. He suggested that moving the ‘abstract’ test into the function, the
same as within the function call helper opcode, might be a way forward – but there
was no further discussion on this or Sara’s points on internals@, and the proffered
patch subsequently went into CVS without further changes.

Short version: ZEND_ACC_DEPRECATED is in CVS HEAD;
refinements and a PHP_5_1 implementation could follow.

CVS: sys_getloadavg

Changes in CVS that you should probably be aware of include:

  • Memory leaks on duplicate cookies fixed across all branches (bug #36205) [Dmitry]
  • ReflectionProperty now returns the correct visibility level
    (bug #36337) [Ilia]
  • In ext/PDO_PGSQL, getColumnMeta() no longer crashes
    (bug #36382) [Derick]
  • In ext/PDO_MySQL, problems with loading large BLOBs were
    fixed (bug #36345)
  • A Zend Engine leak on big doubles was stemmed across all branches [Tony
  • Apache custom 5xx errors return the correct HTTP response error code now
    (bug #36400) [Tony]
  • In ext/oci8, oci_execute() now supports
    OCI_DESCRIBE_ONLY again (bug #36403) [Tony]
  • An ext/mysqli crash when accessing num_rows after calling
    close() was fixed (bug #36420) [Ilia]
  • A useful new function, sys_getloadavg(), was added to the PHP
    core in CVS HEAD and PHP_5_1 branch (only). The function has been running in
    production on www.php.net for a
    couple of years, and is deemed stable. [Wez]
  • In ext/dba, dba_exists() and dba_fetch()
    should now work with Berkeley DB4 (bug #36436) [Marcus]

In the background Marcus was busy working on iterators, both in the Zend Engine
and in ext/unicode, in CVS HEAD. He managed to speed up the Unicode
TextIterator and ReverseTextIterator implementations
mid-week. Clayton suggested a small change (using the
offsetof() macro
in text_iter_to_obj() rather than a bunch of
static declarations) to help the win32 build along – hopefully that’ll
work with all the other compilers PHP’s built under, too. Over the weekend, Marcus
fixed a more widespread memory corruption issue he’d discovered when
foreach is used with iterators, and sorted out an index
bug he’d found thanks to Clayton’s manual manipulation of those earlier in the

Short version: A steady trickle of CVS traffic all this

PAT: stream_close

Brion’s foray into Unicode support uncovered an error in the default setting for
the unicode_semantics .ini directive. Naturally everyone working
on the implementation has unicode_semantics on, so nobody’d noticed
that the default setting “off” had been set as a string – which, in PHP, evaluates
to true. Brion posted a small patch to give the setting a Boolean value
alongside this observation, which Marcus promptly applied. Marcus also added a new
function, unicode_semantics(), to easily test that particular
.ini setting from within PHP.

I was also looking at CVS HEAD for the first time this week, and somehow managed
to fall across the last pval in the entire source code.
zval replaced pval with the advent of PHP 4, but PHP 6 is
the first version to lack support for it; that single wrong letter was breaking
three internal functions. Dmitry applied the patch.

Andrei applied a fix to preg_quote() in ext/pcre in all
current branches of PHP following a report from Jeffrey Friedl that the
NULL byte was being wrongly escaped there, causing problems when
working with octal digits.

Sebastian requested – and got – a function renamed in the Reflection API, when he
argued that ReflectionParameter, ReflectionMethod and
ReflectionProperty should either all have getClass() or
getDeclaringClass() methods, but not a mixture of both. Marcus took his
point, and altered ReflectionParameter::getClass() to match the others.
BC was maintained in PHP_5_1 branch, but the method is now named
getDeclaringClass() in CVS HEAD.

Michael Wallner caught a typo in Dmitry’s patch to make the switch for Unicode
server-wide. It had a misplaced #endif that prevented ‘release’ builds
of the PHP source in CVS HEAD. Dmitry quickly fixed it, and thanked Michael for
picking it up so quickly.

Finally, PHP user Jared Williams wondered why fclose() returns a
Boolean value but stream_close() is void? Wez replied that
stream_close() could return ‘any value it likes‘, but that value
is ignored internally; he agreed that this was a bug, and asked for a volunteer to
provide the fix because he didn’t have a CVS checkout of the appropriate PHP
version. Jared responded with a patch for
[dead link], which Wez reviewed (‘Looks good‘) but was unable
to commit, for the same reason. Volunteers welcome.

Short version: Streams patch is online already; nothing new for
PAT this week.