Zend Weekly Summaries Issue #359

December 3, 2007

Uncategorized

TLK: PHP 5.3 feature list (summary)
TLK: PDO and MySQL
TLK: Late static binding
TLK: Mark functions as const [continued]
FIX: Iconv holes
RFC: Additional information for op_arrays
CVS: Coverity reports get some attention
PAT: ODBC and ext/filter

16th September – 22nd September 2007


TLK: PHP 5.3 feature list (summary)

Ilia Alshanetsky, still wearing his RM hat, announced the href="http://bb.prohost.org/53Features.pdf">final tally of votes for
features in PHP 5.3. He thanked the 28 people who had voted.

The voting options had been 1, 0, or -1, and Ilia had decided to kick out
anything that didn’t score at least 10. This left PHP 5.3 with:

  • namespace support
  • ext/intl
  • late static binding
  • SQLite 3 support in ext/sqlite
  • the mysqlnd library in the core
  • OpenID support in ext/openssl
  • new array_replace[_recursive]() functions
  • E_DEPRECATED
  • zend_arg_info_const()
  • ZEND_SIGNED_MULTIPLY_LONG() optimization
  • new INI parser/scanner
  • __callStatic()

Of the remaining 6 items, only the proposal to remove safe_mode
et al in PHP 5.3 had a negative score.

Stas Malyshev wondered about those features with low scores, noting that -1
and +1 aren’t the same thing as 0 and 0. He also wondered if there would be a
PHP 5.2.5. Ilia dismissed the low score items, on the grounds that low scores
reflected a lack of interest or consensus. Anything that hadn’t got through
this poll could be revisited at a later point release. As for 5.2.5, given
that the agenda for PHP 5.3 is ‘fairly busy‘, Ilia expected that
PHP_5_2 would be the stable release branch for at least the next six months.

Lukas Smith re-jigged the data to make it easier on the eye and href="http://oss.backendmedia.com/PhP53VoteResult">put the results on his
wiki. He wrote that several items were difficult for the wider community
to vote on due to a lack of expertise, something several voters had noted at
the time. Given that factor, he found Ilia’s blanket threshold
questionable‘. Lukas also asked everyone to check that their votes
had been interpreted correctly, since some of the voting had been
conditional. Ilia reiterated that anything without mainstream support was a
niche feature in his view; he’d been trying to identify key features. In his
view, introducing the rest would be tantamount to encouraging feature creep.

I posted an analysis of
the failed features. Two – a switch in ext/mysqli and the GCC 4
-fvisibility patch – had no quorum; they drew only 3 and 8 votes
respectively. Three – strict classes, GC and ext/phar – had far more
positive than negative votes. The only item that was cleanly voted out was
safe_mode and friends. It seemed obvious, to me, that the first
two had failed purely because those not directly affected by them didn’t
understand them enough to express an opinion either way. Those who did
understand them had overwhelmingly voted for them, but were in the minority.
That left just three genuinely contentious items. The only question was when
to debate those three.

Marcus Börger, pointing out that his baby was one of the three, asked me
to drop the politics. I hadn’t been trying to be political; it just didn’t
make sense to me to dictate which features could or couldn’t go into an
extension or specific build system purely on the level of outside interest. I
agreed with Marcus that the three contentious items could probably wait.
Marcus argued that if we were going the route of democracy then we should
accept its decisions in full; the alternative was to have polls whose results
are ignored. Ilia wrote that the point of “core” was common usage, not niche
features. I argued that in that case, niche features should never have been
on the list. Nuno Lopes backed me 100% in this.

Stas tried again; there is a big difference between -fvisibility
and strict classes. One changes nothing within the language and benefits a
handful of PHP extension builders. The other changes the language and adds a
controversial feature, affecting all. Clearly, these are not equal. The
latter can’t really be decided by a simple vote anyway; it needs greater
consideration, and should be discussed.

David Wang agreed with Stas. He wrote that the vote had been ‘very helpful
in narrowing the field of debate
‘; in his view, it meant there are now
only three items open for discussion. He felt that high numbers of zero votes
reflected a lack of information, and thought perhaps there should be a
separate round of voting on those three issues after discussing them. David
was fully aware that there hadn’t been a widespread review of his GC patch,
for example – which is stable, doesn’t affect those not using it, doesn’t
affect performance, and would help make PHP a better choice for anything
running longer than a Web script. He added that he’d been keeping the
prerequisite macro patch up to date for so long it was becoming an
embarrassment. Lukas noted that he, for one, hadn’t understood until now that
GC would not be enabled by default.

Derick Rethans, picking up on the exchange between Ilia and myself, explained
to Ilia that it’s not so simple as a lack of interest implying a niche
feature. ‘Not everything can be done in extensions‘; besides, as Lukas
had said, not everyone felt qualified to vote on all the items.

Short version: Lesson: find out how votes will be measured before
casting yours.


TLK: PDO and MySQL

Lester Caine wondered how far into the future the PHP 5.3 effort would push
PHP 6… More pertinently, he wondered how useful the mysqlnd
library is, to those who never enable ext/mysql (read: Lester Caine).

Pierre explained that both ext/mysql and ext/mysqli can use
mysqlnd, but added that he had ‘serious doubts‘ about the
motivation of the MySQL team to support PDO. Lars Gunther demurred, noting
that the MySQL bloggers had written about ext/mysqli first, PDO
second, and hadn’t mentioned ext/mysql at all. He felt that the onus
was on the PHP community to commit to PDO, for example by using PDO in
introductory tutorials, blog posts and books. Besides, wasn’t it the case
that the PDO_MYSQL maintainers are not MySQL AB employees? Pierre
pointed out that, historically, MySQL’s attitude has been ‘mysql first,
then mysqli, and pdo “if it works well when you are done”
‘. George
Schlossnagle, Wez Furlong and Ilia are listed as maintainers because they had
done the groundwork on the PDO_MYSQL driver. (Pierre didn’t say it, but
this came about purely because MySQL AB refused at the time.) Then again,
MySQL AB employees haven’t been maintainers or initiators of any of the .NET
database layers either…

Lukas intervened to announce that he was due to attend a developers’ meeting
in Heidelberg in the near future. He would try to get a commitment from MySQL
AB to PDO development there and then.

Short version: May the force be with Lukas.


TLK: Late static binding

Etienne Kneuss had now tested Dmitry Stogov’s late static binding patch, and
wrote to Dmitry to alert him to a problem. The patch failed one of Etienne’s
tests, and he believed this might show a difference in their conception of
LSB usage. In the following code:

class A {
    public static function
foo() {
        static::
who();
    }

    public static function who() {
        echo
__CLASS__."\n";
    }
}

class B extends A {
    public static function
test() {
        
A::foo();
    }

    public static function who() {
        echo
__CLASS__."\n";
    }
}

B::test();


Etienne would expect the output to be A. Dmitry’s patch returned
B. He wasn’t sure that it would be easy to fix Dmitry’s patch to
return A. On the other hand, it would be fairly straightforward
to add the extra functionality in Dmitry’s patch to his own. Etienne followed
up with href="http://patches.colder.ch/Zend/late_static_bindings_take7.patch?markup">a
link to his updated version.

Mike Lively agreed that that code should return A, but thought
it equally important to be able to pass the name of the caller when dealing
with inheritance. For example, if there were a child class that needed to
extend an inherited method:


class Foo {
    const
TEST_CONST = 'foo';

    public function test() {
        return static::
TEST_CONST;
    }
}

class Bar extends Foo {
    const
TEST_CONST = 'bar';

    public function test() {
        
//do class specific things
        //try to continue the function chain
        
return parent::test();
    }
}

it should have a way to return bar rather than, as now,
foo. It would make the most sense to him if it were possible to
get that return value by having


return static::test();


but that actually caused a segfault with Etienne’s patch… Etienne pointed
out that endless recursion was responsible for that segfault, but agreed that
Mike had a point. However, he addeed, LSB is usually used to avoid
redeclaring static methods in child classes. Mike replied that occasionally
it’s necessary to have the child class do a little extra, and ‘if this
breaks LSB with no workaround… that would be no good
‘.

Stas agreed with Etienne’s original post; he also thought the return value
there should be A. Dmitry agreed, too, when he caught up with
his mail; he thanked Etienne for catching the problem. He’d now reviewed
Etienne’s latest patch, and found only one serious problem – it didn’t handle
nested calls. However, there were aspects of Etienne’s work that Dmitry liked
better than his own, and href="http://news.php.net/php.internals/32367">his latest version
actually incorporated those, alongside several optimizations. He suggested
that Etienne should also mix and match; they were aiming for the same
behaviour, so the emphasis now was purely on speed.

Mike intervened again. He’d found ‘a serious problem‘ with both
patches, again focusing on inheritance:


class A {
    static public function
test() {
        echo
get_called_class()."\n";
    }
}

class B extends A {
    static public function
test() {
        
// Performing additional needed tasks.
        
parent::test();
    }
}


In B::test(), it was not possible to identify the calling class.
Mike felt that there should be a way to do this; either parent::
should forward the information or there should be a new keyword for it.
Dmitry agreed (‘Good catch!‘), and added support for this to the
patch. However, he was now uncertain which behaviour should be in the final
version. Support for inheritance would certainly provide more flexibility,
but it would also make the concept of late static binding harder to
understand. Lukas thought that, since inheritance is an advanced OO concept
anyway, keeping things consistent was more important than keeping them
simple. Marcus agreed, given that inheritance is supported ‘all over‘.
Dmitry then tried to clarify the expected behaviour. Should it be:


class A {
    function
foo() {
        return
get_called_class();
    }
}
class
B extends A {
    function
bar() {
        return
A::foo();
    }
}
class
C extends A {
    function
bar() {
        return
B::bar();
    }
}
echo
B::foo(); /* returns B */
echo B::bar(); /* returns B, because B::bar() calls A::foo() and A is parent of B */
echo C::bar(); /* returns B, because C::bar() calls B::bar() and B is not parent of C */


Or should the last two both return A?

Marcus preferred that they all return B, but Stas had apoplexy.
How could it be right to call A::foo() and get B?
If you call A::foo() you should always get A,
regardless of where it is called. Given that B::bar() in this
script always calls A::foo(), the last two lines should return
A, in his view. Mike thought that if they all returned
A, and if there was no way to overload a static function
and forward the original called class to the parent, the usefulness of
late static binding would be much diminished. Stas demanded to know what Mike
meant by ‘overloading a static function‘ and what this
usefulness‘ would be useful for exactly. Mike obliged with href="http://news.php.net/php.internals/32410">a more concrete example.
Stas looked at it, but didn’t understand why Mike couldn’t just replace a
call to parent::buildSelectQuery() with a call to
static::buildSelectQuery() to get the intended result. Dmitry
explained: ‘With static:: you are not able to call a method of the parent
with the same name
‘. Stas gave up in disgust:

I'm starting to dislike all this idea of "let's have an object without
really having an object"... It's becoming messy now - we now need to
start to invent all kinds of complex propagation rules for it in
parallel with the true object propagation rules. Why not just have an
object then?

Alexey Zakhlestin didn’t help matters by offering a third way (via a mail
client that news.php.net can’t deal with, grr):


class A {
    function
foo() {
        return
get_called_class();
    }
}

class B extends A {
    function
bar1() {
        return
self::foo(); // it is called using inheritance-rules
    
}

    function bar2() {
        return
A::foo(); // it is called directly!
    
}
}

echo B::bar1(); // "B"
echo B::bar2(); // "A"


Dmitry looked into it, but thought that having something work for
self:: and static:: but not for
parent:: or NAME:: would be ‘a little bit
inconsistent
‘. Still, the fact that there were clearly at least three
different ideas of the LSB concept meant the patch would be delayed again.
Stas suggested it might have been better ‘to spend time on figuring out
the concept and then do the patch than first do the patch and then discover
we don’t know how it was supposed to work’
.

Stefan Walk asked quietly whether it would be possible to distinguish between
parent:: and ParentClassName::? Stas wasn’t sure of
the difference and asked for an example, but Dmitry wrote thoughtfully that
it would indeed be technically possible to propagate LSB using
self::, parent:: and static:: and not
CLASS::. In fact, he thought this might be a good compromise. A
relieved Mike wrote that he was very much in favour, although he questioned
the appropriateness of having self:: work in that way. Marcus
also liked the idea, and wrote that it seemed the most natural solution.

Short version: Everything finally seems to be falling into
place.


TLK: Mark functions as const [continued]

Nuno Lopes glanced at Peter Brodensen’s post and was forced to agree that
strlen() wasn’t the best example to use when describing
deterministic functions. Johannes Schlüter disagreed; in PHP 5 there are
only binary strings and strlen() simply counts bytes, and in PHP
6 the script encoding is known at compile-time. The only edge case would
arise if someone used mbstring.func_overload to overwrite
strlen(). Nuno pointed out that that particular edge case is a
run-time setting that can affect a function, and in any such situation there
isn’t the option to replace the function call blindly with its value. He’d
intended to mark only functions that could be replaced blindly.

Niceties apart, Andrew Brampton wanted to know when
strlen('abcd') would be transformed in the following script:


$a = false; // To be set by external input

if ($a) {
    
$b = strlen('abcd');

}


Would the optimizer calculate the return value before it was known whether
the function would be called?

Nuno explained that this RFC is about whether to make the information
available, not about the optimizer itself. A naive optimizer would convert
strlen() there, yes – but that wouldn’t in itself be
problematic, since the result determined at that stage would be cached.

Short version: Pointless exercise or clever optimization? It
depends on the script.


FIX: Iconv holes

Sean Finney wrote to internals@ to ask about a missing fix for a security
issue in PHP 5.2.4 and earlier, href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2007-4840">CVE-2007-4840.
According to the report, various areas of ext/iconv were vulnerable to
DoS attacks. Sean had looked through CVS, but couldn’t find anything that
resembled a fix. Would anyone care to comment?

Stas would. He commented on the strangeness of having a glibc
bug listed as a PHP issue. That said, he guessed it would be possible to
impose a limit on charset length within PHP, at least… Sean
hadn’t realized the problem was in glibc, but conceded that it
wouldn’t be the first time PHP had been blamed for security problems in a
third-party library. Stas confirmed that he could reproduce this particular
security problem in pure C. Sean discovered he could do that too, and
promptly wrote to the security team at debian.org with the information. The
Debian team used it to close a couple of their own bug reports, but noted
that ‘the caller‘ (read: PHP) should be responsible for sanitizing
incoming charsets. Stas spent some time going through the
iconv extension at this point, and eventually pronounced the issue
fixed.

Short version: OK, so it was a little bit ‘ours’.


RFC: Additional information for op_arrays

PHP security expert Stefan Esser wanted to talk to ‘the major players in
bytecode caching tools’
, and chose to address them via the internals
list.

From time to time, he needed to store extra information for specific opcode
arrays. Although it’s already possible to use a reserved slot in the
op_array structure to store simple values, this approach can be
unreliable; Stefan gave the example of an old bug where APC had neglected to
request extra space from the Zend Engine, thereby overwriting the first few
slots. A second problem was that the amount of data that can be stored in
this way is restricted. Leaving a pointer there wasn’t an option either,
since this would break if the opcode array were shared among processes or
stored on disk. Stefan therefore proposed a modification to the
op_array structure that would allow extensions to append data of
an arbitrary size, capable of being cached alongside the rest of the
op_array data.

Derick Rethans liked the idea, but wanted to know how to go about using one
of the existing reserved slots. Stefan explained that you simply use the
integer returned by zend_get_resource_handle() to claim a slot
in the reserved area.

Returning to his proposal, the extension data could be manipulated using API
functions such as zend_op_array_*_data(); the one for adding
data would take flags as a parameter. The extended data would be
stored in a data structure, and the op_array struct would contain
a pointer to it. Any overhead would be restricted to the extension requiring
the extra data and the opcode caches that needed to cache it. Stefan believed
it could be accomplished by having caching software traverse the list of data
and store it all with the op_array; he didn’t believe it would
require many changes in the software.

Stas pointed out that it’s not actually possible to store pointers to disk in
memory. That said, there’s also no way to support arbitrary data without using
pointers, so there would need to be shared memory pointers in any case.
Further, the reserved slots in the op_array struct were intended
for general extension usage as well as for opcode caching; not all the data
stored there would need to be cached. Adding data that should be
cached would mean having to figure out how to deal with arbitrary data sizes
and structures that could be stored in shared memory and concurrently used.

Stas saw this as a ‘rather complex task‘, and he wasn’t even certain
it belonged in the Zend Engine. However, he had some questions. Would the
data be composed of an arbitrary length list of (void *, size)
pairs? If so, and if the opcode cache only knew the size, the data structure
would need to be in a position-independent form – which most C structures
using pointers are not. What would happen if there were no bytecode cache?
How should the data be retrieved from the cache? Finally, how would
modifications to the data be handled? Should the bytecode cache be notified
of the change, or should the cached data be immutable?

Stefan explained the problem he hoped to solve. His extension generates
metadata, for example “was generated by eval()‘d code”, “uses
variable XYZ“, or “uses session functionality”. The naive way to
link this information to the op_array would be to reserve a slot
and simply put a pointer in there. This works when there is no opcode cache,
because the extra data has to be regenerated every time anyway. It also works
when the opcode cache operates on a per process/thread basis, because the
pointer remains valid. However, it fails when an opcode cache stores data on
disk and loads it back; the pointer is invalid, or the metadata is never
generated. It also fails when an opcode cache shares data between processes,
again returning an invalid pointer.

Stefan believed that op_arrays in shared memory cannot be used
directly, but need to be copied into their own address space. If this
assumption was correct, the simplest solution might be to have some kind of
suspend/resume functionality for op_arrays. Extensions could
register a suspend hook, which would be called when the
op_array was put into shared memory. The resume
hook would then be called prior to reading from shared memory. However, his
initial idea had been more complicated… he’d imagined a
suspend hook that would create a binary data package, with the
opcode caching software then copying that binary data into shared memory or
writing it to disk. The resume hook would decode that data on
request.

To make it simple for the opcode caching software, PHP could handle the
association between data and extension internally; the caching software would
only need to know about zend_suspend_op_array(op_array, *buffer,
*size)
, zend_resume_op_array(op_array, buffer, size), and
storage for the extra data in a buffer of size bytes alongside
the op_array.

Short version: Concurrency + storage is a bit of a minefield.


CVS: Coverity reports get some attention

Changes in CVS that you should probably be aware of include:

  • Zend Engine build bug #42629
    (Dynamically loaded PHP extensions need symbols exported on MacOSX) was fixed
    [Jani Taskinen]
  • In ext/xmlrpc, bug
    #42189
    (xmlrpc_set_type() crashes PHP on invalid datetime
    values) was fixed [Ilia]
  • In the mbstring extension, href="http://bugs.php.net/42502">bug #42502 (GCC no longer implements
    <varargs.h>) was fixed [Rui Hirokawa]
  • In ext/xmlreader, bug
    #42139
    (XMLReader option constants are broken using XML())
    was fixed [Rob Richards]
  • Coverity issues #462 (invalid
    read when opendir() over FTP fails), #405 and #407 (memory leak
    on error), #390 (return value of getgroups() not checked for
    -1), #385 and #386 (no check for NULL) were all fixed in quick
    succession [Tony]
  • Coverity issues #398, #399 and
    #400 (memory leak inside pack()) , #401 and #402 (memory leak
    inside array_diff()), #355 (bad check for filename), #403 and
    #404 (memory leak inside array_intersect()) were fixed; all but
    the first three in 5_2 only [Ilia]

In other CVS news, Dmitry had been busy optimizing array functions. As a
result, array_intersect_key(),
array_intersect_assoc() and
array_uintersect_assoc() now run 100 times faster on large
arrays than before, and array_diff_key(),
array_diff_assoc() and array_udiff_assoc() are also
much improved.

Short version: Don’t look now, but somebody touched ext/xmlrpc!


PAT: ODBC and ext/filter

One Alexandra Shpindovsky had taken it upon herself to look into href="http://bugs.php.net/bug.php?id=37527">an ODBC bug, which is no bad
thing. It seems that the crash resulting from a request to a closed
connection was caused because odbc_close() only deletes the
statements associated with the connection, and not the connection itself.
Alexandra’s suggested fix was given href="http://news.php.net/php.internals/32368">in her post. Unfortunately
there is no maintainer for the odbc extension at present, so this
passed without comment.

Stas applied several patches during the course of the week. One, to limit the
length of the argument passed to dl(), came from Christian
Hoffman. The rest came from Mattias Bengtsson, who had evidently been
following Stas’ work on the iconv extension with eagle eyes.

And finally, someone named Arnaud posted a patch to fix ext/filter href="http://bugs.php.net/42718">bug #42718
(FILTER_UNSAFE_RAW not applied when configured as default
filter, even with flags). He’d noticed that php_sapi_filter()
intentionally bypasses this particular filter. Although
FILTER_UNSAFE_RAW does nothing by default, it is the only filter
able to optionally strip or encode special characters, and Arnaud felt that it
shouldn’t be bypassed when default_filter_flags is other than
0.

Pierre, responding via the bug report, explained that the
UNSAFE_RAW filter isn’t supposed to do anything.
FILTER_SANITIZE_STRING should do the job, given the correct
flags. He asked what Arnaud was trying to accomplish exactly. Arnaud’s
explanation – that he was trying to strip low ASCII characters from
GET/POST/COOKIE data – led Jani to
later back his complaint.

Short version: If Jani didn’t bogus it, it must be for real!

One Response to “Zend Weekly Summaries Issue #359”

  1. ostapk Says:

    There is a book titled "Learning PHP Data Objects" recently published by Packt: http://www.packtpub.com/Learning-PHP-Data-Objects-Open-Source/book

    A post at author’s blog for links to reviews: http://www.onphp5.com/article/58

    One reviewer suggests that this is the first book ever written entirely about PDO.