Categories


Loading feed
Loading feed

Zend Weekly Summaries Issue #359


TLK: PHP 5.3 feature list (summary)
TLK: PDO and MySQL
TLK: Late static binding
TLK: Mark functions as const [continued]
FIX: Iconv holes
RFC: Additional information for op_arrays
CVS: Coverity reports get some attention
PAT: ODBC and ext/filter

16th September - 22nd September 2007

TLK: PHP 5.3 feature list (summary)

Ilia Alshanetsky, still wearing his RM hat, announced the final tally of votes for features in PHP 5.3. He thanked the 28 people who had voted.

The voting options had been 1, 0, or -1, and Ilia had decided to kick out anything that didn't score at least 10. This left PHP 5.3 with:

  • namespace support
  • ext/intl
  • late static binding
  • SQLite 3 support in ext/sqlite
  • the mysqlnd library in the core
  • OpenID support in ext/openssl
  • new array_replace[_recursive]() functions
  • E_DEPRECATED
  • zend_arg_info_const()
  • ZEND_SIGNED_MULTIPLY_LONG() optimization
  • new INI parser/scanner
  • __callStatic()

Of the remaining 6 items, only the proposal to remove safe_mode et al in PHP 5.3 had a negative score.

Stas Malyshev wondered about those features with low scores, noting that -1 and +1 aren't the same thing as 0 and 0. He also wondered if there would be a PHP 5.2.5. Ilia dismissed the low score items, on the grounds that low scores reflected a lack of interest or consensus. Anything that hadn't got through this poll could be revisited at a later point release. As for 5.2.5, given that the agenda for PHP 5.3 is 'fairly busy', Ilia expected that PHP_5_2 would be the stable release branch for at least the next six months.

Lukas Smith re-jigged the data to make it easier on the eye and put the results on his wiki. He wrote that several items were difficult for the wider community to vote on due to a lack of expertise, something several voters had noted at the time. Given that factor, he found Ilia's blanket threshold 'questionable'. Lukas also asked everyone to check that their votes had been interpreted correctly, since some of the voting had been conditional. Ilia reiterated that anything without mainstream support was a niche feature in his view; he'd been trying to identify key features. In his view, introducing the rest would be tantamount to encouraging feature creep.

I posted an analysis of the failed features. Two - a switch in ext/mysqli and the GCC 4 -fvisibility patch - had no quorum; they drew only 3 and 8 votes respectively. Three - strict classes, GC and ext/phar - had far more positive than negative votes. The only item that was cleanly voted out was safe_mode and friends. It seemed obvious, to me, that the first two had failed purely because those not directly affected by them didn't understand them enough to express an opinion either way. Those who did understand them had overwhelmingly voted for them, but were in the minority. That left just three genuinely contentious items. The only question was when to debate those three.

Marcus Börger, pointing out that his baby was one of the three, asked me to drop the politics. I hadn't been trying to be political; it just didn't make sense to me to dictate which features could or couldn't go into an extension or specific build system purely on the level of outside interest. I agreed with Marcus that the three contentious items could probably wait. Marcus argued that if we were going the route of democracy then we should accept its decisions in full; the alternative was to have polls whose results are ignored. Ilia wrote that the point of "core" was common usage, not niche features. I argued that in that case, niche features should never have been on the list. Nuno Lopes backed me 100% in this.

Stas tried again; there is a big difference between -fvisibility and strict classes. One changes nothing within the language and benefits a handful of PHP extension builders. The other changes the language and adds a controversial feature, affecting all. Clearly, these are not equal. The latter can't really be decided by a simple vote anyway; it needs greater consideration, and should be discussed.

David Wang agreed with Stas. He wrote that the vote had been 'very helpful in narrowing the field of debate'; in his view, it meant there are now only three items open for discussion. He felt that high numbers of zero votes reflected a lack of information, and thought perhaps there should be a separate round of voting on those three issues after discussing them. David was fully aware that there hadn't been a widespread review of his GC patch, for example - which is stable, doesn't affect those not using it, doesn't affect performance, and would help make PHP a better choice for anything running longer than a Web script. He added that he'd been keeping the prerequisite macro patch up to date for so long it was becoming an embarrassment. Lukas noted that he, for one, hadn't understood until now that GC would not be enabled by default.

Derick Rethans, picking up on the exchange between Ilia and myself, explained to Ilia that it's not so simple as a lack of interest implying a niche feature. 'Not everything can be done in extensions'; besides, as Lukas had said, not everyone felt qualified to vote on all the items.

Short version: Lesson: find out how votes will be measured before casting yours.

TLK: PDO and MySQL

Lester Caine wondered how far into the future the PHP 5.3 effort would push PHP 6... More pertinently, he wondered how useful the mysqlnd library is, to those who never enable ext/mysql (read: Lester Caine).

Pierre explained that both ext/mysql and ext/mysqli can use mysqlnd, but added that he had 'serious doubts' about the motivation of the MySQL team to support PDO. Lars Gunther demurred, noting that the MySQL bloggers had written about ext/mysqli first, PDO second, and hadn't mentioned ext/mysql at all. He felt that the onus was on the PHP community to commit to PDO, for example by using PDO in introductory tutorials, blog posts and books. Besides, wasn't it the case that the PDO_MYSQL maintainers are not MySQL AB employees? Pierre pointed out that, historically, MySQL's attitude has been 'mysql first, then mysqli, and pdo "if it works well when you are done"'. George Schlossnagle, Wez Furlong and Ilia are listed as maintainers because they had done the groundwork on the PDO_MYSQL driver. (Pierre didn't say it, but this came about purely because MySQL AB refused at the time.) Then again, MySQL AB employees haven't been maintainers or initiators of any of the .NET database layers either...

Lukas intervened to announce that he was due to attend a developers' meeting in Heidelberg in the near future. He would try to get a commitment from MySQL AB to PDO development there and then.

Short version: May the force be with Lukas.

TLK: Late static binding

Etienne Kneuss had now tested Dmitry Stogov's late static binding patch, and wrote to Dmitry to alert him to a problem. The patch failed one of Etienne's tests, and he believed this might show a difference in their conception of LSB usage. In the following code:

class A {
    public static function
foo() {
        static::
who();
    }

    public static function
who() {
        echo
__CLASS__."\n";
    }
}

class
B extends A {
    public static function
test() {
        
A::foo();
    }

    public static function
who() {
        echo
__CLASS__."\n";
    }
}

B::test();

Etienne would expect the output to be A. Dmitry's patch returned B. He wasn't sure that it would be easy to fix Dmitry's patch to return A. On the other hand, it would be fairly straightforward to add the extra functionality in Dmitry's patch to his own. Etienne followed up with a link to his updated version.

Mike Lively agreed that that code should return A, but thought it equally important to be able to pass the name of the caller when dealing with inheritance. For example, if there were a child class that needed to extend an inherited method:

class Foo {
    const
TEST_CONST = 'foo';

    public function
test() {
        return static::
TEST_CONST;
    }
}

class
Bar extends Foo {
    const
TEST_CONST = 'bar';

    public function
test() {
        
//do class specific things
        //try to continue the function chain
        
return parent::test();
    }
}

it should have a way to return bar rather than, as now, foo. It would make the most sense to him if it were possible to get that return value by having

return static::test();

but that actually caused a segfault with Etienne's patch... Etienne pointed out that endless recursion was responsible for that segfault, but agreed that Mike had a point. However, he addeed, LSB is usually used to avoid redeclaring static methods in child classes. Mike replied that occasionally it's necessary to have the child class do a little extra, and 'if this breaks LSB with no workaround... that would be no good'.

Stas agreed with Etienne's original post; he also thought the return value there should be A. Dmitry agreed, too, when he caught up with his mail; he thanked Etienne for catching the problem. He'd now reviewed Etienne's latest patch, and found only one serious problem - it didn't handle nested calls. However, there were aspects of Etienne's work that Dmitry liked better than his own, and his latest version actually incorporated those, alongside several optimizations. He suggested that Etienne should also mix and match; they were aiming for the same behaviour, so the emphasis now was purely on speed.

Mike intervened again. He'd found 'a serious problem' with both patches, again focusing on inheritance:

class A {
    static public function
test() {
        echo
get_called_class()."\n";
    }
}

class
B extends A {
    static public function
test() {
        
// Performing additional needed tasks.
        
parent::test();
    }
}

In B::test(), it was not possible to identify the calling class. Mike felt that there should be a way to do this; either parent:: should forward the information or there should be a new keyword for it. Dmitry agreed ('Good catch!'), and added support for this to the patch. However, he was now uncertain which behaviour should be in the final version. Support for inheritance would certainly provide more flexibility, but it would also make the concept of late static binding harder to understand. Lukas thought that, since inheritance is an advanced OO concept anyway, keeping things consistent was more important than keeping them simple. Marcus agreed, given that inheritance is supported 'all over'. Dmitry then tried to clarify the expected behaviour. Should it be:

class A {
    function
foo() {
        return
get_called_class();
    }
}
class
B extends A {
    function
bar() {
        return
A::foo();
    }
}
class
C extends A {
    function
bar() {
        return
B::bar();
    }
}
echo
B::foo(); /* returns B */
echo B::bar(); /* returns B, because B::bar() calls A::foo() and A is parent of B */
echo C::bar(); /* returns B, because C::bar() calls B::bar() and B is not parent of C */

Or should the last two both return A?

Marcus preferred that they all return B, but Stas had apoplexy. How could it be right to call A::foo() and get B? If you call A::foo() you should always get A, regardless of where it is called. Given that B::bar() in this script always calls A::foo(), the last two lines should return A, in his view. Mike thought that if they all returned A, and if there was no way to overload a static function and forward the original called class to the parent, the usefulness of late static binding would be much diminished. Stas demanded to know what Mike meant by 'overloading a static function' and what this 'usefulness' would be useful for exactly. Mike obliged with a more concrete example. Stas looked at it, but didn't understand why Mike couldn't just replace a call to parent::buildSelectQuery() with a call to static::buildSelectQuery() to get the intended result. Dmitry explained: 'With static:: you are not able to call a method of the parent with the same name'. Stas gave up in disgust:

I'm starting to dislike all this idea of "let's have an object without
really having an object"... It's becoming messy now - we now need to
start to invent all kinds of complex propagation rules for it in
parallel with the true object propagation rules. Why not just have an
object then?

Alexey Zakhlestin didn't help matters by offering a third way (via a mail client that news.php.net can't deal with, grr):

class A {
    function
foo() {
        return
get_called_class();
    }
}

class
B extends A {
    function
bar1() {
        return
self::foo(); // it is called using inheritance-rules
    
}

    function
bar2() {
        return
A::foo(); // it is called directly!
    
}
}

echo
B::bar1(); // "B"
echo B::bar2(); // "A"

Dmitry looked into it, but thought that having something work for self:: and static:: but not for parent:: or NAME:: would be 'a little bit inconsistent'. Still, the fact that there were clearly at least three different ideas of the LSB concept meant the patch would be delayed again. Stas suggested it might have been better 'to spend time on figuring out the concept and then do the patch than first do the patch and then discover we don't know how it was supposed to work'.

Stefan Walk asked quietly whether it would be possible to distinguish between parent:: and ParentClassName::? Stas wasn't sure of the difference and asked for an example, but Dmitry wrote thoughtfully that it would indeed be technically possible to propagate LSB using self::, parent:: and static:: and not CLASS::. In fact, he thought this might be a good compromise. A relieved Mike wrote that he was very much in favour, although he questioned the appropriateness of having self:: work in that way. Marcus also liked the idea, and wrote that it seemed the most natural solution.

Short version: Everything finally seems to be falling into place.

TLK: Mark functions as const [continued]

Nuno Lopes glanced at Peter Brodensen's post and was forced to agree that strlen() wasn't the best example to use when describing deterministic functions. Johannes Schlüter disagreed; in PHP 5 there are only binary strings and strlen() simply counts bytes, and in PHP 6 the script encoding is known at compile-time. The only edge case would arise if someone used mbstring.func_overload to overwrite strlen(). Nuno pointed out that that particular edge case is a run-time setting that can affect a function, and in any such situation there isn't the option to replace the function call blindly with its value. He'd intended to mark only functions that could be replaced blindly.

Niceties apart, Andrew Brampton wanted to know when strlen('abcd') would be transformed in the following script:

$a = false; // To be set by external input

if ($a) {
    
$b = strlen('abcd');

}

Would the optimizer calculate the return value before it was known whether the function would be called?

Nuno explained that this RFC is about whether to make the information available, not about the optimizer itself. A naive optimizer would convert strlen() there, yes - but that wouldn't in itself be problematic, since the result determined at that stage would be cached.

Short version: Pointless exercise or clever optimization? It depends on the script.

FIX: Iconv holes

Sean Finney wrote to internals@ to ask about a missing fix for a security issue in PHP 5.2.4 and earlier, CVE-2007-4840. According to the report, various areas of ext/iconv were vulnerable to DoS attacks. Sean had looked through CVS, but couldn't find anything that resembled a fix. Would anyone care to comment?

Stas would. He commented on the strangeness of having a glibc bug listed as a PHP issue. That said, he guessed it would be possible to impose a limit on charset length within PHP, at least... Sean hadn't realized the problem was in glibc, but conceded that it wouldn't be the first time PHP had been blamed for security problems in a third-party library. Stas confirmed that he could reproduce this particular security problem in pure C. Sean discovered he could do that too, and promptly wrote to the security team at debian.org with the information. The Debian team used it to close a couple of their own bug reports, but noted that 'the caller' (read: PHP) should be responsible for sanitizing incoming charsets. Stas spent some time going through the iconv extension at this point, and eventually pronounced the issue fixed.

Short version: OK, so it was a little bit 'ours'.

RFC: Additional information for op_arrays

PHP security expert Stefan Esser wanted to talk to 'the major players in bytecode caching tools', and chose to address them via the internals list.

From time to time, he needed to store extra information for specific opcode arrays. Although it's already possible to use a reserved slot in the op_array structure to store simple values, this approach can be unreliable; Stefan gave the example of an old bug where APC had neglected to request extra space from the Zend Engine, thereby overwriting the first few slots. A second problem was that the amount of data that can be stored in this way is restricted. Leaving a pointer there wasn't an option either, since this would break if the opcode array were shared among processes or stored on disk. Stefan therefore proposed a modification to the op_array structure that would allow extensions to append data of an arbitrary size, capable of being cached alongside the rest of the op_array data.

Derick Rethans liked the idea, but wanted to know how to go about using one of the existing reserved slots. Stefan explained that you simply use the integer returned by zend_get_resource_handle() to claim a slot in the reserved area.

Returning to his proposal, the extension data could be manipulated using API functions such as zend_op_array_*_data(); the one for adding data would take flags as a parameter. The extended data would be stored in a data structure, and the op_array struct would contain a pointer to it. Any overhead would be restricted to the extension requiring the extra data and the opcode caches that needed to cache it. Stefan believed it could be accomplished by having caching software traverse the list of data and store it all with the op_array; he didn't believe it would require many changes in the software.

Stas pointed out that it's not actually possible to store pointers to disk in memory. That said, there's also no way to support arbitrary data without using pointers, so there would need to be shared memory pointers in any case. Further, the reserved slots in the op_array struct were intended for general extension usage as well as for opcode caching; not all the data stored there would need to be cached. Adding data that should be cached would mean having to figure out how to deal with arbitrary data sizes and structures that could be stored in shared memory and concurrently used.

Stas saw this as a 'rather complex task', and he wasn't even certain it belonged in the Zend Engine. However, he had some questions. Would the data be composed of an arbitrary length list of (void *, size) pairs? If so, and if the opcode cache only knew the size, the data structure would need to be in a position-independent form - which most C structures using pointers are not. What would happen if there were no bytecode cache? How should the data be retrieved from the cache? Finally, how would modifications to the data be handled? Should the bytecode cache be notified of the change, or should the cached data be immutable?

Stefan explained the problem he hoped to solve. His extension generates metadata, for example "was generated by eval()'d code", "uses variable XYZ", or "uses session functionality". The naive way to link this information to the op_array would be to reserve a slot and simply put a pointer in there. This works when there is no opcode cache, because the extra data has to be regenerated every time anyway. It also works when the opcode cache operates on a per process/thread basis, because the pointer remains valid. However, it fails when an opcode cache stores data on disk and loads it back; the pointer is invalid, or the metadata is never generated. It also fails when an opcode cache shares data between processes, again returning an invalid pointer.

Stefan believed that op_arrays in shared memory cannot be used directly, but need to be copied into their own address space. If this assumption was correct, the simplest solution might be to have some kind of suspend/resume functionality for op_arrays. Extensions could register a suspend hook, which would be called when the op_array was put into shared memory. The resume hook would then be called prior to reading from shared memory. However, his initial idea had been more complicated... he'd imagined a suspend hook that would create a binary data package, with the opcode caching software then copying that binary data into shared memory or writing it to disk. The resume hook would decode that data on request.

To make it simple for the opcode caching software, PHP could handle the association between data and extension internally; the caching software would only need to know about zend_suspend_op_array(op_array, *buffer, *size), zend_resume_op_array(op_array, buffer, size), and storage for the extra data in a buffer of size bytes alongside the op_array.

Short version: Concurrency + storage is a bit of a minefield.

CVS: Coverity reports get some attention

Changes in CVS that you should probably be aware of include:

  • Zend Engine build bug #42629 (Dynamically loaded PHP extensions need symbols exported on MacOSX) was fixed [Jani Taskinen]
  • In ext/xmlrpc, bug #42189 (xmlrpc_set_type() crashes PHP on invalid datetime values) was fixed [Ilia]
  • In the mbstring extension, bug #42502 (GCC no longer implements <varargs.h>) was fixed [Rui Hirokawa]
  • In ext/xmlreader, bug #42139 (XMLReader option constants are broken using XML()) was fixed [Rob Richards]
  • Coverity issues #462 (invalid read when opendir() over FTP fails), #405 and #407 (memory leak on error), #390 (return value of getgroups() not checked for -1), #385 and #386 (no check for NULL) were all fixed in quick succession [Tony]
  • Coverity issues #398, #399 and #400 (memory leak inside pack()) , #401 and #402 (memory leak inside array_diff()), #355 (bad check for filename), #403 and #404 (memory leak inside array_intersect()) were fixed; all but the first three in 5_2 only [Ilia]

In other CVS news, Dmitry had been busy optimizing array functions. As a result, array_intersect_key(), array_intersect_assoc() and array_uintersect_assoc() now run 100 times faster on large arrays than before, and array_diff_key(), array_diff_assoc() and array_udiff_assoc() are also much improved.

Short version: Don't look now, but somebody touched ext/xmlrpc!

PAT: ODBC and ext/filter

One Alexandra Shpindovsky had taken it upon herself to look into an ODBC bug, which is no bad thing. It seems that the crash resulting from a request to a closed connection was caused because odbc_close() only deletes the statements associated with the connection, and not the connection itself. Alexandra's suggested fix was given in her post. Unfortunately there is no maintainer for the odbc extension at present, so this passed without comment.

Stas applied several patches during the course of the week. One, to limit the length of the argument passed to dl(), came from Christian Hoffman. The rest came from Mattias Bengtsson, who had evidently been following Stas' work on the iconv extension with eagle eyes.

And finally, someone named Arnaud posted a patch to fix ext/filter bug #42718 (FILTER_UNSAFE_RAW not applied when configured as default filter, even with flags). He'd noticed that php_sapi_filter() intentionally bypasses this particular filter. Although FILTER_UNSAFE_RAW does nothing by default, it is the only filter able to optionally strip or encode special characters, and Arnaud felt that it shouldn't be bypassed when default_filter_flags is other than 0.

Pierre, responding via the bug report, explained that the UNSAFE_RAW filter isn't supposed to do anything. FILTER_SANITIZE_STRING should do the job, given the correct flags. He asked what Arnaud was trying to accomplish exactly. Arnaud's explanation - that he was trying to strip low ASCII characters from GET/POST/COOKIE data - led Jani to later back his complaint.

Short version: If Jani didn't bogus it, it must be for real!

Comments


Monday, December 3, 2007
PDO BOOK
1:14PM PST · ostapk