Zend Weekly Summaries Issue #368

      Comments Off on Zend Weekly Summaries Issue #368

TLK: Taint mode decision
TLK: Late static binding
REQ: Type hinting of class properties
REQ: PECL/core agenda
REQ: Better exception error handling
TLK: WSDL load error
TLK: Disabling the built-in POST handler
TLK: Cleanup and maintenance offer
TLK: Optional scalar type hinting [continued]
REQ: End of support notices
BUG: String parser changes
CVS: Ternary shortcut in 5_3
PAT: PDO antics, and a bit of constifying

18th November – 24th November 2007

TLK: Taint mode decision

Stefan Esser wanted to know if there’d ever been a firm decision that Wietse
Venema’s taint mode will go into the PHP core. He wrote that there are
currently two taint implementations available; GRASP, by Coresecurity, which
has working byte level tainting but is slow, and Wietse’s, which is faster
but which – Stefan claimed – has a broken design and is insecure. Stefan
added bluntly that he disliked the idea of a taint mode in PHP precisely
because it’s not possible to have an implementation that is both secure and
fast. He went on to give some examples of wrong assumptions made in Wietse’s
implementation:

  • _SERVER['PHP_SELF'] is not made safe and allows XSS (and
    more) in many applications
  • "SELECT * FROM table WHERE
    id=".mysql_real_escape_string($id)
    is NOT secure but will result in no
    taint warning
  • echo '....<sometag style="some-attribute:
    ',htmlentities($user_input),'">'
    will allow XSS through the style
    attribute without a taint warning
  • echo '....<img src="',htmlentities($user_input),'">' will
    allow XSS through javascript: URL (e.g. in Opera) without a
    taint warning

Nuno Lopes quickly reassured Stefan that no decision had been made yet. He
also didn’t like the idea of a taint mode in the core, not because of any
specific weakness in one implementation or another, but because he saw it as
a third-party tool. He wanted to know whether Stefan had found any real world
exploitable bugs with GRASP?

Dan Scott wrote that, since it had already been
‘mostly agreed’
that taint mode should only be used in development, the
speed of the implementation was largely irrelevant. Stefan retorted that this
wasn’t the problem; the problem was that neither variable-based nor byte level
tainting can offer complete security, even where the test environment has
(hypothetically) full code coverage. For example, given


$sql['id'] = mysql_real_escape_string($_GET['id']);
$query = "SELECT * FROM
table WHERE id="
.$sql['id'];


Wietse’s taint mode would consider it safe, and GRASP would ignore the user
supplied data in the SQL query because it’s numeric.

Lukas Smith, pointing out that there’s no such thing as 100% security for
anything that allows user access, wanted to know whether Stefan was against
the use of taint models altogether or just against bundling them? He saw
taint as purely a development tool, in which case the only question should be
whether either of the proposed solutions is ready for use. Lukas also wondered
how other languages solve the problem of taint? How did Ruby’s taint model
work, and were there any other languages that had one? Stas Malyshev replied
that Perl has taint; he believed it to be variable-based. Stefan explained
that, in Perl, the developer can only use tainted input by explicitly calling
untaint() on it. This wasn’t the same thing as implicit
untainting, where functions like htmlentities() and
mysql_real_escape_string() can mark data ‘untainted’ without
regard to context. Stas corrected him; applying any regexp in Perl does just
that. Lukas, though, agreed that this constitutes a fundamental difference,
and came up with the idea of making bytecode caches smart enough to strip out
untaint() calls in production. That said, most people would
probably appreciate a tool that just does the job without their having to
alter their code.

Stas likened taint mode to an alarm clock; ‘it can wake you up, but can’t
ensure you actually will go to work and do something productive there.

He didn’t see how this would make the idea of tainting worthless.

Wietse argued that, in his code, there is actually contextual
awareness, e.g. htmlentities() only marks data as safe for HTML.
He asked Stefan for an explicit example of a case where


"SELECT * FROM table WHERE id=".mysql_real_escape_string($id)

wouldn’t be safe. When it came to Javascript execution, Wietse explained that
he was still working on it; he had yet to realistically simulate every browser
out there. He didn’t think that meant he should give up trying to warn people
about known bad coding practices; it just meant he couldn’t warn them against
all of those.

Stefan admitted that he’d have to search for a good example, but wrote that
the bigger problem in Wietse’s implementation was that code like:


"SELECT * FROM table WHERE id=$id"

would always advise the developer to use
mysql_real_escape_string(). Since doing so will mark
$id as untainted, the developer is given the wrong message about
how to make user data secure. Wietse agreed that the warning message could be
improved, but explained that the idea was simply to help the programmer to do
the right thing, rather than to guarantee it.

PHP user Troels Knak-Nielsen pointed out that something intended for testing
wouldn’t be turned on by default, and wondered if a tool like php-sat might
provide a better solution? Wietse
carefully explained
the difference between static analysis and run-time taint analysis.

Ezequiel Gutesman of the GRASP development team chose that moment to
introduce himself to the internals list. He explained that GRASP was designed
to be used in production, and with that in mind, false alarms, whether
positive or negative, are unacceptable. The main design aim was to block
ongoing attacks, rather than to warn the developer about insecure coding
habits. The GRASP team were, therefore, all for Wietse’s taint mode.

Going back to Stefan’s earlier claim that numeric characters in
$_GET['id'] won’t raise an alarm, Ezequiel explained that the
GRASP team hadn’t found a way to perform an SQL injection attack using only
numeric characters. Markus Fischer couldn’t see anything in the original
example that forces $id to actually be a numeric value, and
linked to an
article
describing how to exploit such code even if
mysql_real_escape_string() is called.

David Zülke quoted Yoda
“Do, or do not. There is no try.” – but this approach was
comprehensively dismissed by Lukas (‘The only secure application is one
that hasn’t been deployed anywhere
‘) and Stas (‘By your logic, all
security solutions and all testing are useless
‘). David clarified; he
only meant that it was impossible to cover all the potential security issues
with a taint mode feature, and that attempting to do so would give
inexperienced users a false sense of security. An explicit
untaint() approach would make far more sense to him than
some implicit guessing magic that, once again, means people are gonna
switch their brains off
‘. Stefan Priebsch didn’t care about the people
that switch their brains off; whatever happened, those people would always
write ‘crappy code‘. He did, however, care about having taint mode in
PHP, because it would help him make his own code a little more secure with
next to no effort.

Christian Schneider wrote thoughtfully that most people seemed to agree that
taint mode would be a valuable tool for themselves, with the only
major drawback in Wietse’s approach being the likely perception that no taint
warnings indicate secure code. As far as Christian’s own code is concerned,
the mechanism provided by ext/filter is at the wrong end of the chain;
he needs data to be left in its original state for as long as possible and
sanitized only at the point of use. For him, Wietse’s solution would
therefore be extremely useful; not having the tool would seem a great loss.

Short version: Taint marks themselves are less of an issue than the
untainting model.

TLK: Late static binding

Gergely Hodicska (I have to confess to uncertainty about the name – the
sign-off doesn’t match) wanted to know if there’d been any decision about the
behaviour of inheritance in late static binding? He wanted to be able to do
this:


class
ActiveRecord {
    public static function
findByPk($id) {
        
$calledClass = get_called_class();
    }
}

class
Blog extends ActiveRecord
{
    public
static function
findByPk($id) {
        
parent::findByPk($id);
    }
}

Blog::findByPk(1);



but currently, the value of $calledClass is
ActiveRecord.

Etienne Kneuss wrote that this was discussed some time ago.
It all came down to whether fully established calls should break the
resolution or not; both approaches had drawbacks. Implementing this would
mean introducing another keyword and further complications. Gergely (we’ll
assume) had found that thread, but not a conclusion. He believed his example
was a legitimate use case, and found the current behaviour confusing.
Johannes Schlüter quickly responded that the current behaviour felt
right to him, since the call to parent::findByPk() is an
independent call to an explicit class. Mike Lively disagreed, arguing that
this behaviour makes it completely impossible to engage any more complex form
of inheritance or cope with decorators. He also found it inconsistent with
instance inheritance; wasn’t the whole point of LSB to give static calls the
same flexibility as instance calls? While agreeing that <class
name>::<static method>()
should break the caller chain, surely
parent:: should just forward the called class? If you actually
wanted the parent, you could just use the class name.

Stas argued that if Mike wanted objects, he should ‘use the real
thing
‘. LSB was supposed to resolve an explicit problem, that being the
inability to distinguish A::method() from
B::method() when B extends A. Besides,
‘more complex’ is not always better. Mike pointed out that he had been among
those making the original feature request, and while resolving that explicit
problem may well have been the goal of the patch authors, it certainly hadn’t
been his. He’d seen LSB as a way to provide a flexible inheritance model for
statics, thereby killing the need to instantiate objects for the sole purpose
of instantiating other objects (e.g. in factories); he saw this as fundamental
to OO design. Although that goal had been met, the inability to re-implement
following class extension was a severe limitation. Stas disagreed; the
missing feature had been some way to find the name of the true calling class,
and this had now been provided. He saw no reason to add further complications
to the language.

Gergely asked simply whether Stas had found his example too ‘complex’? He
believed a lot of users would be confused by that result. Stas essentially
argued that those users should RTFM, particularly the definitions of
self:: and parent::. Gergely posted a link to his
blog
, which contains more code examples, and asked whether the behaviour
in the last two code blocks on that page wasn’t confusing. Just because
something could be explained in the manual didn’t make it the best solution.
Lukas wrote cautiously that Stas had probably meant the example code should
use self:: rather than parent::. Alexey Zakhlestin
pointed out that that meant calling this exact method, rather than the parent
class method. Lukas suggested that adding the class name as a second parameter
to the parent method would suffice. Alexey agreed this was possible, but
didn’t see it as a good solution. The ‘least-surprise’ solution would be to
have it work the way Gergely and Mike had suggested. Lukas disagreed; not
only did he see that as less intuitive, it would also introduce back
compatibility issues. Adding some new magic constants (he had
__SELF__ in mind) should be enough to resolve the problem,
leaving self:: for more complex operations. Gergely disputed
that it would introduce BC issues, since both __CLASS__ and
get_class() work as they always did. Mike agreed, claiming that
the only thing affected would be the resolution of static::,
which has never been in a PHP release.

Jochem Maas, who evidently didn’t have a current PHP snapshot installed,
asked for clarification. Given:


class
A {
    static function
find() {}
}

class B extends A
{}
B::find();


could A::find() tell that it was called as
B::find()? Stas replied, if a little wearily, that that is
exactly what LSB does. Mike explained that it breaks down when you want to
specialize B::find(), ‘because parent:: is considered an
explicit class name reference
‘. Stas retorted that it is possible
to specialize the method, just not by using parent::. He
recommended using the return value of get_called_class() to call
the required method, but Mike still felt there was ‘a disconnect
somewhere
‘. If Stas was suggesting something like:


static
public function
test() {
    
parent::test(get_class());
}


Mike could see two problems with it. Firstly, this is already possible in
PHP, so why bother with LSB? Secondly, and more importantly, this kind of
loose inheritance with statics wasn’t supported in PHP 6 last time he
checked. Stas asked what Mike wanted parent::test() to mean, if
not “the method test in the parent class of the class where this
statement is”. He wasn’t sure what Mike meant by ‘loose inheritance’. Mike
downloaded PHP 6 before responding, and found himself in the wrong about
that. However, he explained that by ‘loose inheritance’ he meant the ability
to extend and overwrite a method while changing that method’s parameter list.
In the situation where B::test() was called to start the call
chain, he wanted the B::test() method to decorate
A::test(), and this just isn’t possible at present. Support for
it would either mean introducing a new keyword or allowing
parent:: to forward the called class.

Richard Quadling and Marco Kaiser turned up at this point and exchanged huge
long tracts of code, the former to express how great LSB is and the latter to
demonstrate the problems he was having with it. Marcus Börger put an end
to these antics with a couple of demonstrations of
LSB usage.

Short version: The actual problem is solved, but the extent of the
solution is too limited for the OO folk.

REQ: Type hinting of class properties

Baptiste Autin went where angels fear to tread and asked whether there is any
hope of seeing class properties and return values type hinted one day… as an
option. He saw it as a ‘half-done job‘ in PHP 5, where only
function parameters have the feature. Design patterns make much use of
composition, which would be more readable with type hinting. Model-driven
reverse-engineering tools would also find it useful.

While he was at it, Baptiste wondered about the possibility of having the
superglobal arrays stored in system classes too…

Short version: Another Java escapee joins the list.

REQ: PECL/core agenda

Following up on Gaetano
Giunta’s crusade
to
name and shame those extensions that lack versioning information, Lukas Smith
wrote that he believed the relationship between PECL and core generally –
including extension versioning – should be ‘very high up on our agenda.’ He
asked if someone actively maintaining an extension both in PECL and the PHP
core would be prepared to write up a proposal that could act as a basis for
discussion. Alternatively, perhaps those developers meeting in Paris could
find time to sit down and talk about this?

Short version: And pigs may fly…

REQ: Better exception error handling

PHP user Ken Stanley wanted to check that he hadn’t missed anything before
logging a feature request. He was using set_exception_handler()
to standardize a project’s error response to all exceptions. Unthinkingly, he
had written a View class that throws an Exception if
no view is found. Ken felt that the resulting error message:


Fatal error: Exception thrown without a stack frame in Unknown on line 0

could be improved upon.

Ken took some pains to explain that he does understand why the error
occurred, and had now corrected his code to use trigger_error()
rather than throw an exception. However, he had noticed other PHP users
coming across the same problem without an easy solution. Long story short,
Ken felt that providing a filename – and perhaps even a line number – in the
error message to show where the last exception was thrown would be useful.
However, he didn’t know whether this had already been implemented in a more
recent version of PHP, and lacked the skills to find out from source. He also
didn’t know whether it was even possible to provide that information in the
error message. Finally, Ken didn’t know whether he should re-open bug #31304, which appeared to be a
request for the same thing, or create a new report.

Alexey Zakhlestin and Evert|Rooftop both chimed in with a recommendation that
Ken use a try/catch block rather than set_exception_handler(),
which should really only be used for debugging. Ken agreed, but pointed out
that this had nothing to do with the questions he was actually asking. He
apologized that he hadn’t been clear enough in his first post, and explained
the whole thing over again.

Tony Dovgal explained that the execution phase is finished at the point when
exception handlers and shutdown functions are called. Since no script is
being executed, there is no filename or line number information available for
the error message to use. Ken thanked him for his response, but wondered if it
wouldn’t be possible to temporarily store that information to pass to the
exception handler?

Edward Z. Yang saw the whole thing as a documentation problem, pointing out
that set_exception_handler() is ‘often touted as an easy way
to define a global try/catch block
‘. It wasn’t designed to call a complex
subsystem to render the error; it occurs during the destruct phase. He felt
the documentation should at least mention that throwing an exception from
inside the handler will result in a fatal error, and probably also encourage
the use of a global try/catch block.

Short version: Shutdown stuff’s always entertaining.

TLK: WSDL load error

Nick Loeve had noticed that bug
#42773
(WSDL error causes HTTP 500 Response) had been fixed in a very
literal way in PHP 5.2.5. To his mind, the real problem was that a failed
attempt to load a WSDL should raise an exception, and not a fatal error.
Should he open a new bug report about this?

Alexey pointed out that exceptions are generally not thrown from core PHP
except during object construction. Nick pointed out that the constructor for
SoapClient specifically allows the user to request exceptions for SoapFaults.
Was this not a SoapFault? Lukas believed that it wasn’t, since a SoapFault is
something thrown by the SOAP service and the WSDL read error occurs before
the SOAP service is involved. That said, he agreed with Nick that it
shouldn’t be a fatal error.

Nick meanwhile had been playing around and made the discovery that attempting
to load a WSDL that doesn’t exist throws two fatal errors – one a
SOAP-ERROR saying the WDSL cannot be loaded, and the other an uncaught
SoapFault exception with a faultString that says the same thing. Wrapping the
constructor in a try/catch block allowed him to catch the exception, but of
course did not prevent the fatal error. Nick had by now found bug #34657, closed as bogus
because it appeared to be an Xdebug issue, and yet the offending fatal error
was still in php_sdl.c:


wsdl = soap_xmlParseFile(struri TSRMLS_CC);

if (!wsdl) {
    soap_error1(E_ERROR, "Parsing WSDL: Couldn't load
from '%s'", struri);
}

Should he open a new report, request that one of the previous reports be
re-opened, or accept that this behaviour is never going to change? Nick
didn’t mind trying to write a patch to fix the problem, but had found a
number of other operations in the SOAP extension that error out in this way.

Short version: Maybe a check for the exception option would be wise.

TLK: Disabling the built-in POST handler

Following a brief discussion
on the php-general mailing list
, Stefanos Stamatis had tracked
down the rfc1867_post_handler() function in
main/rfc1867.c. He found that this function checks whether the posted
content length exceeds the value of the post_max_size INI
directive. If it does, an E_WARNING is thrown and the function
is aborted. Ergo, by overriding post_max_size (which can be
achieved from Apache configuration using php_value) it is
possible to disable the built-in POST handler. The only thing
Stefanos didn’t know was whether this was expected behaviour that could be
relied upon.

Hannes Magnusson wrote simply ‘Yes‘, and added a link to the manual entry, which
states that post_max_size has been an INI_PER_DIR
setting since PHP 4.2.3.

Richard Quadling, who takes his documentation duties very seriously, asked
whether the manual entry shouldn’t also state that a
post_max_size setting of zero will inhibit $_FILES?

Short version: Another PHP user gets to learn C the hard way.

TLK: Cleanup and maintenance offer

Andy Lester, ‘usually a Perl person‘, introduced himself on the
internals list. He had been helping clean up the internal code in Perl 5 and
Parrot over the past few years, and – having looked through the PHP_5_2
sources – wondered if the PHP crew would also appreciate help in this. The
specific areas he would look at included using const qualifiers
on core functions and variables where possible and minimizing variable scope.
He had written ‘a guide to the benefits of consting‘ in the context of
the Parrot project, and shared
the link
to ensure that everyone knew what he was talking about here.
Would the PHP core team be interested, and if so where should Andy begin?

Tony explained that the PHP_5_2 branch is actually bugfix-only, and the work
to ‘constify’ had started a few
months ago
in the current branches. Andy saw this as a good sign, but
noted that there are still plenty of other sectors of code to hit. He also
wondered if compiler warning levels shouldn’t be raised; the PHP build
doesn’t default to running with -Wall under GCC. Tony pointed
out that actually it does, but only if you --enable-debug.

Marcus Börger was happy to get a constify-ing offer, and added that
killing off TSRMLS_FETCH() where possible would also be good.
This being highly PHP-specific, he added an example (but no explanation) in his post. As for where
to begin, Marcus suggested that Andy start the same way everyone else on the
team did, by providing patches against CVS HEAD and PHP_5_3; assuming all went
well, he’d soon have CVS access.

Short version: What, no more Perl jokes?

TLK: Optional scalar type hinting [continued]

Derick picked up Sam Barrow’s request for optional scalar type hinting, and –
surprisingly – wrote in to back it. He knew, though, that his ‘quick hack’ was
not the best implementation. Cristian Rodriguez was there too, so long as it
wasn’t mandatory. David Coallier mentioned support for basic types as objects,
but even Sam didn’t think that was a good idea. Hannes described it
as ‘the worst idea I’ve heard on internals for over a month’, and demonstrated
that you can do this in
userland
if you need to anyway. David pointed out that it’s okay to just
say you don’t like the idea… however, given its reception, he guessed it
would be just as easy to implement it in an extension.

Sam explained that he both liked and disliked the loose typing in PHP; it
makes the language easy, which he liked, but the lack of strictness also
allows undetected errors, which he liked less. To him, it would make perfect
sense to evolve PHP into a hybrid, where typing is dynamic but still
controllable via type hinting, and mix-and-match parameters are allowed.
Richard Quadling came up with the idea of having type hinting generate an
E_NOTICE (presumably, when the type of a passed variable is wrong). Alexey
liked that idea, but thought E_STRICT would better fit the bill. Sam liked it
too; wasn’t there currently a fatal error for this? ‘We could just turn it
into an E_NOTICE or E_WARNING’.

Marcus saw the autoboxing aspect of the discussion and recommended they all
look into pecl/SPL_Types, which provides a base implementation
already.

Short version: It’d be interesting to hear what the Engine
gurus have to say on the matter.

REQ: End of support notices

Marcus put in a plea for official RM announcements of the end of support for
PHP 4, PHP 5.0 and PHP 5.1 on the php.net home page as of 31st December 2007.
Tony agreed that it should be made as clear as possible that other versions
are no longer supported, given that the 5_2 branch will be ending soon. Ilia
Alshanetsky pointed out that it won’t be all that soon – the 5_2
branch has to stick around at least until the 5_3 branch is stable – but
agreed that there should be end-of-life announcements for the long-dead
PHP_5_0 and PHP_5_1 branches.

Rasmus Lerdorf referred Marcus to the php.net homepage, which has carried an
end-of-life announcement for PHP 4 for some months now. Marcus replied
blithely that this means simply adding the other two, but Derick Rethans
didn’t see a good reason not to repeat it. Tomas Kuliavas grumbled that there
is actually an eight month difference between the date on that end-of-life
announcement and the date proposed by Marcus. Besides, as he recalled it the
decision had only affected PHP 4 in the first place. Ilia repeated that the
older PHP 5 versions have long being discontinued; all that was being
suggested now was that this information should be made public.

Hannes noted that there is actually an item about unsupported historical releases on php.net
already, but agreed that putting something on the front page would be a good
idea.

Short version: Betcha nobody actually remembered to do this.

BUG: String parser changes

Someone named Serge had discovered a change in the way strings are parsed in
PHP 5.2.5. The sequences \f and \v are now special,
and are parsed as FF and VT symbols. This made no
sense to him, since it broke backward compatibility and was likely to affect
many scripts. Those particular symbols are rarely used, and he didn’t see how
the feature could be useful to many developers. Furthermore, Serge had checked
the documentation and found the change isn’t even mentioned there; worse, in
the documentation, escaping backslashes only when necessary is encouraged. He
therefore asked that the change be rolled back.

Edward Z. Yang referred Serge to the bug report that had sparked
the change, and shared his
opinion that stray backslashes in double-quoted strings should always be
escaped. The documentation had actually been updated; the mirror Serge was
using was probably out of date. Hannes explained that none of the
documentation mirrors are up to date at present; ‘Our build master is MIA,
the only mirror that is up to date is
http://docs.php.net‘. Serge pointed
out that if the manual has stated in the past that only certain characters
are escaped while others are not, that behaviour should never be changed
because doing so will break existing code. Moreover, you still can’t
use \f or \v in application code because any
version older than PHP 5.2.5 will treat it in a different way. He simply
didn’t understand why support for such esoteric symbols had been added in the
first place.

Short version: Serge’s right (at least, not in a minor version).

CVS: Ternary shortcut in 5_3

Changes in CVS that you should probably be aware of include:

  • In ext/dbase, bug #42261
    (Incorrect lengths for date and Boolean data types) was fixed across all three
    current branches [Ilia]
  • There is now support for the prefix namespace::, which is
    resolved to the current namespace name in PHP_5_3 and CVS HEAD [Dmitry]
  • Zend Engine bug #43136
    (possible crash on script execution timeout) was fixed in 5_3 and HEAD.
    Internals note: EG(current_execute_data)->function_state
    now fully replaces the defunct EG(function_state_ptr)
    [Dmitry]
  • PDO bug #42978 (mismatch
    between number of bound params and values causes a crash in pdo_pgsql) was
    fixed across all three branches [Ilia]
  • There is a new constant, ZEND_DEBUG_BUILD, in 5_3 and HEAD
    [Jani]
  • The ternary shortcut operator expr1 ?: expr2 was
    backported to the PHP_5_3 branch, along with a warning that this is
    not ifsetor()! [Johannes]
  • Test suite bug #43035
    (ignore_repeated_errors=On causes lot of tests to fail) was
    fixed [Jani]
  • In the Zend Engine, the macro definitions EXPECTED() and
    UNEXPECTED() have been moved to zend.h in 5_3 and HEAD
    (affects internals only) [Dmitry]
  • The new top-level file README.RELEASE_PROCESS is a direct port
    from the release
    checklist
    on Lukas’ wiki [Lukas]
  • Zend Engine bug #43318
    (const allowed outside class definition) was
    fixed in 5_3 and HEAD, alongside a note that const is still
    allowed outside namespaces but arrays are disabled [Dmitry]
  • Core bug #43128 (Very long
    class name causes segfault) was fixed in 5_3 and HEAD [Dmitry]
  • In the date extension, bug
    #43377
    (PHP crashes with invalid argument for DateTimeZone)
    was fixed across all three branches [Ilia]
  • In ext/soap, bug #42952
    (soap cache file is created with insecure permissions) was fixed in the
    PHP_5_3 branch and CVS HEAD [Dmitry]

In other CVS news, Derick came into Dmitry’s sights when he applied a
one-line patch across all four branches (yes four – remember PHP_4_4) to
initialize the reserved resource bits in the op_array. Dmitry
asked if Derick really intended to slow down compilation just to support
some buggy extension‘ and introduced him to
zend_extension_op_array_ctor_handler()s, which should be used to
set up reserved data. Derick retorted that this had looked like an Engine bug
to him, since all the other elements of the structure are properly
initialized. Uninitialized variables had caused a number of problems in the
past, up to and including the reference issues in the PHP 4.3 series, and he
was broadly against them. Besides, if the Zend Engine were actually
documented he might have known about those handlers; he’d look into them now.
Stas explained that C allocators don’t initialize memory unless asked, because
of the performance hit involved.

That would be a big deal for Dmitry, this week in particular; he’d spent most
of his time on optimization work. Areas that should now work faster include:
ZEND_FETCH_DIM, math and comparison operations,
zend_do_fcall_common_helper(), ZEND_DO_FCALL and
ZEND_INIT_FCALL_BY_NAME. Marcus was overjoyed, and complimented
Dmitry on finding an acceptable way to make those last changes – ‘three
years after George, Sterling and me had that idea
‘.

Lukas meanwhile had caught up with Ilia’s negative
response
to the challenge over his fix for PDO bug #43130 (Bound
parameters cannot have - in their name) a few weeks ago. He
argued that the change is a BC break that could affect any user; that
it could break queries for Oracle users porting to PDO ‘in a very
non-obvious way
‘, and that the benefits of the change numbered
approximately zero. Given the design concept of PDO as a thin layer to unify
the client API and provide only basic emulation, Lukas saw this as a
diversion from the ideal that does considerable harm.

Short version: Whoa, it’s still possible to commit to PHP_4_4?

PAT: PDO antics, and a bit of constifying

Lars Westermann committed a patch against PHP_5_3 from Hans-Peter Oeri at the
start of the week to fix PDO_FIREBIRD bug
#43246
(INSERT ... RETURNING ... throws exception).
Hans-Peter followed this up with a patch introducing
PDO::FETCH_2D, which would give a row result consisting of a
two-dimensional hash – first the table name, then the field name:


$result[tablename][columname]

Columns not resulting from a table would be added to a “null base”, by
default at the first level:


$result[computedcolumn]

The connection attribute ATTR_2D_NULLBASE could be used to
define an
alternative “null base”:


$result[nullbase][computedcolumn]

Hans-Peter added that his implementation, currently supporting PDO_MYSQL and
PDO_FIREBIRD, also involved rearranging the FETCH mode constants
to make FETCH_NUM, FETCH_ASSOC and
FETCH_2D bitwise-combinable.

Lukas wasn’t sure the addition would be ‘real world useful‘; in his
experience, there was more of a need for tree structures. That said, he
looked into Hans-Peter’s proposal and suggested only that “nullbase” should
be an empty string rather than providing a potential naming collision. Lukas
mentioned in a follow up post that he would also like to see lazy connect and
driver independent DSN support in PDO, if Hans-Peter was set on creating
feature additions.

Hans-Peter argued that in his real world experience, a framework-like
class was often needed when making changes to joined tables. The
functionality he proposed would be useful when faced with duplicate field
names in different tables, and also when updating tables. The combination of
FETCH_2D with either FETCH_ASSOC or
FETCH_NUM would allow access to fields whose tables were
unknown, with changes to “table-less” fields automagically represented in the
“table” fields. He’d find that extremely useful; although he admittedly
couldn’t say how often, it would be much more useful than the existing
ATTR.FETCH_TABLE_NAMES.

With that off his chest, Hans-Peter cast an eye over Lukas’ wish list:
Why lazy connects?‘ He also wondered if the username and password
could be included directly in the DSN. Tree support, though, would be added
functionality rather than a new way to return fetched data; it needed serious
discussion. Lukas agreed on the last point; it would only make good sense in
PDO if the information is readily available through the RDBMS. Perhaps tree
support rightly belonged in an ORM. He’d like support for lazy connects
because, when caching, there isn’t always the need for a database connection.
Although it’s already possible to create a PDO instance on demand, having a
lazy connect option would make it easier to switch between modes. It would
also make it easier to deal with libraries that expect a PDO object to be
passed to the constructor. Finally, there are good reasons not to store login
credentials in the DSN; security, and the design goal of PDO as a thin layer.
That said, Lukas thought it would be a good idea to support the PEAR::DB DSN
format, which does include login credentials.

So, on to the rest of this week’s patches – and there were many.

Dmitry found time to add the changes suggested
by Wez Furlong
to fix the always_inline symbol collision on
certain systems.

Johannes applied some long-standing
patches
from Benjamin Schulz, bringing msg_queue_exists() to
ext/sysvmsg and stream_supports_lock() to the core in
PHP_5_3 and CVS HEAD.

Stas applied Claudio Cherubino’s PHP 6 patch from last week,
fixing bug #42866
(str_split() returns extra char when given string size is not
multiple of length).

Ilia fulfilled ext/pgsql feature
request #43041
(micro-optimizations in pgsql data retrieval) using a
patch initially supplied by Andy Lester (andy at petdance dot com).

Andy then came up with a couple of the
promised
patches
to ‘constify’ input arguments to the md5 functions and in
dl.c, and asked if it was OK to look at Zend Engine code too.

Hans-Peter had been busy the while. He’d discovered that
PDO::FETCH_KEY_PAIR doesn’t work as documented; all but
two-column result sets throw an error. He suggested generalizing the constant
to PDO::FETCH_KEYS; a single value would be assigned as a
scalar, and multiple columns as FETCH_ASSOC. He believed this
should be fully backwards compatible, but added an ominous postscript that
would prevent most of the team even looking: ‘My diff includes my ‘old’
FETCH_2D patch.

Short version: It can be quite difficult to keep multiple patches
separate, but it pays to make the effort.