Categories


Loading feed
Loading feed

Zend Weekly Summaries Issue #369


BUG: syslog segfault
TLK: Reference cycle collector patch
RFC: Late binding for parent (and other options)
TLK: Struct initializations
BUG: array_slice parameter types
RFC: json_encode flags
TLK: Unicode support and ext/pgsql
TLK: Private property confusion
CVS: This should've been a quiet week...
PAT: 64-bit assembler optimizations

25th November - 1st December 2007

BUG: syslog segfault

Nir Rachmel, who runs PHP 5.2.3 as a statically compiled module in an embedded server (appWeb), reported that his webserver crashes when printing multiple syslogs to the system. He believed he'd traced the source of the trouble via a backtrace to a free() call in openlog(), but wasn't sure how to resolve the problem. Had anyone else run into it?

There was a core backtrace included in Nir's post, and Tony Dovgal asked him to augment it with a printout of the basic_globals contents:

(gdb) p basic_globals

Nir soon found there was no such symbol in the context of any of the frames he'd tried; Tony wrote mournfully that this meant it was a multi-threaded version, which would make things even more complicated. Nir posted the contents of tsrm_ls instead. Tony could only see from this that BG(syslog_device) had somehow been free'd without being NULL'd - which shouldn't happen until shutdown.

Short version: A bad thing happened.

TLK: Reference cycle collector patch

Johannes Schlüter wrote asking for an update on the Zend team's benchmarking of David Wang's reference cycle collector patch, but had no response. Sebastian Bergmann suggested Johannes simply commit the garbage collector to PHP_5_3 now and disable it by default, to make testing and benchmarking easier for all interested parties. Derick Rethans, missing the mischievous 'now' part, replied bemusedly that this was the plan.

Short version: Too long no talk.

RFC: Late binding for parent (and other options)

Following on from the discussion last week about the behaviour of parent::, Mike Lively came up with three different patches against PHP_5_3 to work around the issue, as he saw it, of 'parent:: being considered an explicit class reference'. The patches had different levels of intrusiveness, but all three passed the existing LSB tests.

First up was 'the parent-forwarding patch', which modifies the behaviour of parent:: so that it no longer alters the called_class value stored in the executor globals.

Second came 'the new keyword patch', which has the same effect as the parent-forwarding patch but does it by introducing a new keyword. Mike had no idea what this keyword should be, so for the purposes of demonstration named it __parentf:

class A {
    public static function
test() {
        echo
get_called_class(); // output: B
    
}
}

class
B extends A {
    public static function
test() {
        
__parentf::test();
    }
}

This patch was relatively high-cost in terms of performance, and Mike hoped someone might know a cheaper way to achieve the same result.

Finally, there was 'the forward_static_call() patch'. This approach introduces two new functions, forward_static_call() and forward_static_array(), which are similar in nature to call_user_func*() except that the first parameter specifies the target method in the parent class, rather than a callback:

class B extends A {
    public static function
test() {
        
forward_static_call(__FUNCTION__);
    }
}

This patch, being completely outside the Zend Engine, didn't impact the existing LSB functionality in any way. Mike would be prepared to work further on any of these patches that seemed worth pursuing; he asked for feedback.

Alexey Zakhlestin ventured that he thought the first patch the most natural and intuitive of the three, which brought the wrath of Stas Malyshev upon his head. Stas didn't see how making parent::foo() and A::foo() work differently was either "natural" or "intuitive". Alexey argued that A::foo() had nothing to do with inheritance, whereas parent::foo() plainly did; keeping the context made perfect sense to him. Elizabeth Smith agreed; 'I shudder to think of the number of "bogus" bug reports that'll pop up if LSB doesn't work this way'. She believed classname::foo should mean something other than parent::foo() when it comes to statics. David Zülke also took this view, but noted that the opposition was stronger. He suggested that static should always point to the original callee; get_called_class() should reflect the end of the call chain, in line with Stas' argument; and a new function, get_static_class(), should reflect the start of the call chain. Stas' response to that idea was 'Ouch'. Not only would parent:: and classname:: behave differently, but there would be two separate functions for getting the called class, and one of them with non-functional notation. Besides, if get_called_class() was only supposed to return the name of the parent class, why not use __CLASS__? Mike reminded him that get_class() also exists. David dropped his idea, and went back to the original notion - that get_called_class() should maintain the context of the original callee.

Short version: Stas 1, rest of the world 2.

TLK: Struct initializations

Andy Lester, true to his word, had been making PHP build with more stringent compiler error checking - specifically, -Wextra under GCC. He produced a patch to clean up 'dozens of struct initializations' which, while being valid C code, could hide future problems due to the compiler output triggered by them. Stas queried this - didn't the C standard mandate filling uninitialized fields with 0? - but Marcus Börger saw it as 'more correct code' and voted that Andy's patch be applied in PHP_5_3 and CVS HEAD. Andy responded to Stas, saying that the patch was intended to benefit humans rather than the compiler. Ilia Alshanetsky believed {0} sufficiently clear for humans, but Alexey - who, it should be said, doesn't spend every day fixing bugs in the PHP core - pointed out that {NULL, 0, 0} brings the real structure dimensions directly to the context, potentially leading to fewer header checks.

Andy pointed out that his chief aim was actually to reduce the compiler warnings triggered by {0}; at present there are so many warnings about the number of initializers that all are ignored. The key to long-term maintenance was to clean up the background noise at higher levels of error reporting. Ilia argued that most compiler warnings can safely be ignored, and some are even bogus. Pedantic changes make the code harder to read - particularly when it comes to large and complex structures - and he saw no benefit in them. Andy replied that if his work wasn't useful to the PHP core team, he might as well walk away. Ilia wrote, more gently, that the entire team welcomed any work in the direction of improved maintainability; the problem here was one of balance. Compilers don't always get it right; they can generate warnings when there's nothing wrong with the code. That being the case, it didn't make a lot of sense to code for the compiler with its warning levels turned right up; it made more sense to code for readability. Andy viewed the compiler as akin to a linting mechanism rather than as something to be circumvented; he saw initialized structs as a kind of seatbelt. Constifying and fixing signed/unsigned mismatches were on his list too, but he was trying to get the low-hanging fruit out of the way first.

Stas commented that if a new field is added to a structure, Andy's changes would mean having to go through all the code that mentions that structure adding zeroes. He didn't see this as helpful, particularly given that the compiler can implicitly do that job and the C standard explicitly provides the means for it. He produced coding guidelines and a quotation from the GCC manual to back up his arguments. Andy pointed out softly that having to manually add zeroes ensures that all those errors are gone from the build. Stas retorted that the compiler could do that zero-adding, and it could do it with {0}. That did it for Andy; he said goodbye.

Short version: That was... quick.

BUG: array_slice parameter types

One Dirk Thomas had a question about parameter typing as it appeared in his PHP_5_3 snapshot. What exactly was the desired behaviour when:

array_slice(array $array, int $offset, int $length);

was passed a non-integer length parameter? He'd found that casting it to float resulted in an empty array rather than - as he'd anticipated - a notice or warning. In his opinion, either a warning should be given or the function should handle the wrongly typed parameter gracefully. Was this behaviour specific to array_slice(), or did all PHP functions do the same? There was a bit of a clue in Dirk's final sentence: 'It works with 5.2.x without a problem.'

Marco Kaiser and Moritz Bechler both looked into the source and decided that the length value wasn't being correctly cast; there should be something resembling

convert_to_long(length_param);

in there somewhere. Stas demurred; he didn't see why the length parameter was being parsed as a zval in the first place. It should just be parsed as a long from the start.

Short version: That one just went from a one-line fix to a research job in one easy sentence.

RFC: json_encode flags

Sara Golemon made a welcome appearance on internals@ with a proposal for a new json_encode() feature (posted three times, but what the hey, it's Sara so we'll let her off). She explained that, while it's technically safe to include user supplied data in json_encode() serialized values, certain characters (the usual suspects - <, >, & and ') left room for error. Her patch offered an optional second parameter, options, and a set of new flags to allow the developer to encode those characters to hex references - not exactly a complete security solution, but 'one more rope holding the ship together'. Since this adds five characters for each escaped character, the default behaviour would be to leave them alone, as now. To demonstrate usage:

echo json_encode("<foo>");
"<foo>"

echo json_encode("<foo>", JSON_HEX_TAG);
"\u003Cfoo\u003E"

echo json_encode("<foo bar='baz'>", JSON_HEX_TAG | JSON_HEX_APOS);
"\u003Cfoo bar=\u0027baz\u0027\u003E"

If there were no objections, Sara planned to commit this change after a week.

Rasmus Lerdorf wrote that this had actually been on his TODO, 'but you are way more productive than I am as witnessed by the 3 copies of this proposal we got. ;)'

Alexey wondered what was wrong with using str_replace()? Sara admitted that it resolves the problem, but pointed out that taking that approach when faced with a series of nested arrays and objects 'becomes onerous'. Stas wondered what the patch was supposed to help, given that it didn't really offer a security solution. If security wasn't Sara's chief concern, what was? Sara explained that certain browsers misinterpret legitimate data, and the characters get broken out of their proper context. Stas asked her to clarify the attack vectors. Sara couldn't; she wrote that she didn't know of any successful vectors currently. Then again, she would have sworn that echoing htmlentity'd data was safe, until she came across a browser where it wasn't. The point of the patch was to give PHP users the option to be paranoic in their approach. There were existing ways to achieve the same thing, but they were 'hacky'; performing the escaping inside the call seemed more appropriate to her. Stas responded that he'd wanted to know because there needs to be some explanation provided about the new feature, such as when to use it and what it does, and he wasn't sure he could easily provide it.

Rasmus pointed out that attack vectors move around, and since the proposed syntax is valid JSON it would at least give PHP users another tool. Documenting specific attack vectors seemed of secondary importance to him. He then documented a specific contextual attack vector against htmlentities():

<?php $foo = htmlspecialchars($_GET['foo'], ENT_QUOTES);?>
<a href="" onmouseover="a='<?php echo $foo?>';">Mouse Over Me</a>

'Then try hitting the page and set ?foo=';alert(0);//'

Rasmus added that this isn't a failure of the htmlentities() function itself; it just means it was used in the wrong context. He could imagine contexts where having a version of JSON without HTML tags and quotes could be quite useful.

Short version: Say it once, say it twice, say it again.

TLK: Unicode support and ext/pgsql

PHP user Michael Eshom reported that PHP 6 --with-pgsql was failing to compile under OpenSUSE 10.3 against a home build of PostgreSQL 8.3 beta 1, and attached a long list of make errors to his post. Tony committed a patch and asked Michael to try again, explaining that he doesn't use PostgreSQL himself and doesn't have it installed.

Michael confirmed that the fix worked, but noted that there were still a bunch of "incompatible pointer type" warning messages. Tony explained that those warnings are expected; they stem from the lack of Unicode support in the pgsql extension. Given that there hadn't been much activity from the extension maintainers lately, he wouldn't expect that to change in the near future either. Would Michael care to give them a hand? Michael replied that he wouldn't know where to start, but Tony, sensing an easy victory, regarded this as no problem; 'we can show you what to do'. Michael made a feeble attempt to save himself by writing that he hadn't done anything with C over the last 5 or 6 years, but agreed to see what he could do if Tony was willing to help him out.

Short version: If you want to avoid contributing, don't post intelligent bug reports.

TLK: Private property confusion

Marco Kaiser posted some horribly convoluted code intended to demonstrate that you can access a private class property by setting that property in an object of the same type, and then setting that object as the value of a second property. The surprising thing, to him, was that a child object loaded as a property of the parent failed to reset the other property - the one it shared with the parent - to its own value:

<?php

class MyClass {
    protected
$_object = null;
    private
$_value = 50;

    
// public getters and setters
}

class
ChildClass extends MyClass {
    private
$_value = 100;
}

$class = new MyClass();
$child = new ChildClass();

$class->setObject($child);

echo
$class->getValue(); // 50
echo $child->getValue(); // 50

?>

Marco wanted to know why he was able to access a private property in the first case, and wondered if there was a bug in the second? The child class should have no knowledge of the parent's private property, and he should be able to re-use the name of that property there. Johannes found his way through the maze (I tidied the example a bit) and managed a halfway coherent explanation; a) he couldn't, and b) extended classes know nothing about private members of the parent class. Etienne Kneuss added more helpfully that the accessibility check is based on the current scope, rather than on the instance. Encapsulation is conserved, as private properties are accessible only through methods of the defining class.

Jingcheng Zhang felt they'd both missed something and there was probably a bug involved. If var_dump($child) can be used to view a private property in the parent object, surely it should also be possible to access that property via a public getter in the child, since by implication the property is extended. Etienne explained that this is what protected does; moreover, var_dump() in PHP_5_3 now provides the name of the origin of an 'inherited' private variable:

object(ClassB)#1 (1) {
    ["a":"ClassA":private]=>
    int(2)
}

Short version: Object debugging is a lot easier in PHP_5_3 now.

CVS: This should've been a quiet week...

Changes in CVS that you should probably be aware of include:

  • In ext/dom, bug #43364 (recursive xincludes don't remove internal nodes properly) was fixed in PHP_5_2, PHP_5_3 and CVS HEAD [Rob Richards]
  • In ext/mcrypt, bug #43143 (Warning about empty IV with MCRYPT_MODE_ECB) was fixed in 5_3 and HEAD [Derick]
  • In ext/pcntl, bug #43373 (pcntl_fork() should not raise E_ERROR on error) was fixed across all three branches [Ilia]
  • Core bug #43365 (Several enums have trailing commas) was fixed across all three branches [Jani Taskinen]
  • Core bug #43386 (array_globals not reset to 0 properly on init) was fixed across all three branches [Ilia]
  • In the PDO extension, support for the - character in bound parameter names was reverted, making bug #43130 a "won't fix" [Ilia]

Almost half the mailing list volume this week was sparked by a three-line commit:

wez Tue Nov 27 18:53:17 2007 UTC
Modified files:
/CVSROOT avail loginfo
Log: karma and ml for some pdo bits

+# PDO Specs. CLA required to commit
+unavail||pdo-specs
+avail|wez,andi,stas,iliaa|pdo-specs

The responses started out short and sweet, on the whole:

'Please explain what is this 'pdo-specs' thing and why CLA is needed to commit there.' (Tony)

'One word: transparency. It is amazing how it helps to discuss things instead of acting like that.' (Pierre-Alain Joye)

'Please remove this from PHP. PHP is open source, making this closed and bringing in patents at the end makes me very angry. This has absolutely nothing to do inside PHP.' (Marcus)

but like Topsy they "growed and growed" as it dawned on the old-timers that the newcomers to php.net didn't know what this meant. A few people - Lukas Smith ('I find it amazing how oblivious to community concerns this all is...'), Rasmus Lerdorf ('Keep it out of PHP CVS and host it somewhere else.'), Marcus again ('If people want to commit they can do it in their free time') - were concerned enough to write very long and passionately angry posts, much to the bewilderment of many.

PHP user Daniel Brown was one of the bewildered many. Wasn't it possible that IBM's (we assumed) internal policies required them to have CLAs in place? Perhaps for tax and documentation purposes, or to protect their own engineers from liability? He agreed that a CLA is not in the spirit of the PHP project, but didn't understand how a document essentially offering complete freedom to redistribute the code could be 'such a Big Brother issue'. All Daniel could see as a threat was that CLA'd modules might lay the groundwork for new licensing in the future. Rasmus explained that he didn't have a problem with anyone needing a CLA for components specific to their own products; he agreed that only the final license should matter to most people in that situation. However, general PHP components - such as the whole of PDO rather than a specific PDO driver - were a different matter. Derick added that the onus was on the company to fit into the open source model rather than on the open source model to adapt to commercial needs.

Stas made the mistake of thinking this was a standard internals@ rumpus and complained about the "us vs them" attitude. Wasn't it worth considering that having good database support might be good for all concerned? Pierre, Greg Beaver and I gave warning growls. Richard Quadling, though, was easily swayed by Stas' argument, being another of the bewildered many: 'If IBM, or anyone, wants to submit code to PHP it can only improve things.' He went on to extol the virtues of peer review and the way that nothing gets into CVS without it. I explained that a CLA renders the peer review process null and void, since php.net no longer have control over that area of code; the dev team can't touch it.

Jacques Marneweck of the PHP documentation group posted an update; this particular CLA appeared to relate only to DocBook XML specifications for PDO. I pointed out that the same still applies; if a commit to that module breaks the PHP manual build, we (including the PHP Manual Editor) are prohibited from fixing it. Philip Olson, who is the PHP Manual Editor, guiltily remembered an IBM copyright notice attached to the internals documentation for PDO drivers; IBM had given permission to remove that notice some time ago, but it hadn't yet been done. He promised to make the update 'today'. Marcus asked him to refuse any copyright other than "The PHP Group" or "The PHP Documentation Group" in future, 'otherwise companies are going to own PHP piece by piece'. Hannes Magnusson explained that the copyright notice had simply been committed; there had been no notification or request. There was also no indication that those docs were covered by a CLA. He hoped not, since he'd been committing fixes to them. (They weren't; that's actually impossible.)

Dan Scott wrote a long post recommending the adoption of a legal entity that third parties could assign copyright to, rather oddly declaring that the PHP Group and PHP Documentation Group are not legal entities. ('Oddly' because MySQL AB did exactly this with ext/mysqlnd.) Derick pointed out that in all the countries that signed up to the Berne Convention copyright is protected by default, copyright notice or no. Daniel Brown backed him; the only reason to register the copyright would be if infringement were anticipated.

Rasmus wrote another long post in response to Richard's optimistic mail, explaining that the legal terminology is complex enough to effectively block anyone from signing the contract without first taking legal advice. Companies with a patent portfolio have legitimate concerns when it comes to their employees signing a CLA. Depending on the significance of the module, this could leave php.net in an untenable position. Adam Maccabee Trachtenberg wrote to back Rasmus' arguments, adding that he personally feels less inclined to work on projects with a CLA attached due to the legal hassle. Interestingly, Adam reported some success in getting his company's legal department to agree to releases of internal code under open source licenses, so they evidently weren't unfamiliar with the issues. Marcus had a similar approach; since the problem with a CLA is that there is usually an element of disclosure, the solution was to 'keep your patent stuff at home' and simply get legal permission to work on open source projects.

Stas came back for another round, but we all gave him the mailing list equivalent of hard stares and he soon went away again.

Dan Scott noticed a CVS account request from Jay Pipes asking for access to the new module. He wondered why, if this was purely an IBM thing, MySQL AB employees were signing up for it? After all, MySQL AB has a CLA in place for all third-party contributions to the MySQL code base - perhaps they were the ones pushing for CLA in PHP now? You could almost hear MySQL AB twitching at the accusation. A few hours later, Jay Pipes posted something that sounded very much like an official statement of the company's position, emphatically denying the charge. Essentially, as the owners of a dual licensed code base, they have to require their contributors to sign a CLA to avoid issues arising under the commercial license. PHP, on the other hand, has a completely different licensing and copyright background, and MySQL AB understood that the core PHP developers don't favour CLAs. That being the case, while understanding that 'CLAs are the default way for achieving clarity on IP ownership', MySQL AB saw no reason to make any changes to the current approach.

Short version: Light blue touchpaper, stand well back.

PAT: 64-bit assembler optimizations

Prior to instigating the small war in CVS, Wez Furlong picked up the mixed patch for PDO::FETCH_KEYS and friends posted by Hans-Peter Oeri last week. He had a number of questions about its purpose, the allocation of an empty string for every database connection, and the need for this in the PHP core. Basically, Wez believed that this level of manipulation belongs firmly in user space.

Hans-Peter wrote back defending his babies, by which time Wez had thrown his hand grenade and swiftly disappeared. Marcus in the meantime had asked Hans-Peter for tests to go with his patch, not least so that he could see what the aims of it were. Hans-Peter happily obliged.

And finally, Brian Shire offered a patch against PHP 5.2.5 bringing x86-64 assembly optimizations to zend_alloc.c. He included a simple micro benchmark to test memory allocation, and added that he'd like to get some feedback and results from others on this. He also had some 32-bit related questions for the assembler gurus on the team. I won't pretend to understand those; see the patch link for details.

Short version: It all sounds great, excepting the PHP 5.2.5 part...

Comments