Zend Weekly Summaries Issue #369
BUG: syslog segfault
TLK: Reference cycle collector patch
RFC: Late binding for parent (and other options)
TLK: Struct initializations
BUG: array_slice parameter types
RFC: json_encode flags
TLK: Unicode support and ext/pgsql
TLK: Private property confusion
CVS: This should've been a quiet week...
PAT: 64-bit assembler optimizations
BUG: syslog segfault
Nir Rachmel, who runs PHP 5.2.3 as a statically compiled module in an
embedded server (appWeb), reported that his webserver crashes when printing
multiple syslogs to the system. He believed he'd traced the source of the trouble
via a backtrace to a free() call in openlog(), but
wasn't sure how to resolve the problem. Had anyone else run into it?
There was a core backtrace included in Nir's post, and Tony Dovgal asked him
to augment it with a printout of the basic_globals contents:
(gdb) p basic_globals
|
Nir soon found there was no such symbol in the context of any of the frames
he'd tried; Tony wrote mournfully that this meant it was a multi-threaded
version, which would make things even more complicated. Nir posted the
contents of tsrm_ls instead. Tony could only see from this that
BG(syslog_device) had somehow been free'd without
being NULL'd - which shouldn't happen until shutdown.
Short version: A bad thing happened.
TLK: Reference cycle collector patch
Johannes Schlüter wrote asking for an update on the Zend team's benchmarking of David Wang's reference cycle collector patch, but had no response. Sebastian Bergmann suggested Johannes simply commit the garbage collector to PHP_5_3 now and disable it by default, to make testing and benchmarking easier for all interested parties. Derick Rethans, missing the mischievous 'now' part, replied bemusedly that this was the plan.
Short version: Too long no talk.
RFC: Late binding for parent (and other options)
Following on from the discussion last week
about the behaviour of parent::, Mike Lively came up with three
different patches against PHP_5_3 to work around the issue, as he saw it, of
'parent:: being considered an explicit class reference'. The patches
had different levels of intrusiveness, but all three passed the existing LSB tests.
First up was 'the parent-forwarding patch', which modifies the
behaviour of parent:: so that it no longer alters the
called_class value stored in the executor globals.
Second came 'the new keyword patch', which has the same effect as the
parent-forwarding patch but does it by introducing a new keyword. Mike had no
idea what this keyword should be, so for the purposes of demonstration named
it __parentf:
class
A {
|
This patch was relatively high-cost in terms of performance, and Mike hoped someone might know a cheaper way to achieve the same result.
Finally, there was 'the forward_static_call() patch'. This approach
introduces two new functions, forward_static_call() and
forward_static_array(), which are similar in nature to
call_user_func*() except that the first parameter specifies the
target method in the parent class, rather than a callback:
class B extends A
{
|
This patch, being completely outside the Zend Engine, didn't impact the existing LSB functionality in any way. Mike would be prepared to work further on any of these patches that seemed worth pursuing; he asked for feedback.
Alexey Zakhlestin ventured that he thought the first patch the most natural
and intuitive of the three, which brought the wrath of Stas Malyshev upon his
head. Stas didn't see how making parent::foo() and
A::foo() work differently was either "natural" or "intuitive".
Alexey argued that A::foo() had nothing to do with inheritance,
whereas parent::foo() plainly did; keeping the context made
perfect sense to him. Elizabeth Smith agreed; 'I shudder to think of the
number of "bogus" bug reports that'll pop up if LSB doesn't work this
way'. She believed classname::foo should mean
something other than parent::foo() when it comes to statics.
David Zülke also took this view, but noted that the opposition was
stronger. He suggested that static should always point to the original
callee; get_called_class() should reflect the end of the call
chain, in line with Stas' argument; and a new function,
get_static_class(), should reflect the start of the call chain.
Stas' response to that idea was 'Ouch'. Not only would
parent:: and classname:: behave differently, but
there would be two separate functions for getting the called class, and one
of them with non-functional notation. Besides, if
get_called_class() was only supposed to return the name of the
parent class, why not use __CLASS__? Mike reminded him that
get_class() also exists. David dropped his idea, and went back
to the original notion - that get_called_class() should maintain
the context of the original callee.
Short version: Stas 1, rest of the world 2.
TLK: Struct initializations
Andy Lester, true to his word, had been making PHP build with more stringent
compiler error checking - specifically, -Wextra under GCC. He
produced a patch to clean up 'dozens of struct initializations' which,
while being valid C code, could hide future problems due to the compiler
output triggered by them. Stas queried this - didn't the C standard mandate
filling uninitialized fields with 0? - but Marcus Börger
saw it as 'more correct code' and voted that Andy's patch be applied
in PHP_5_3 and CVS HEAD. Andy responded to Stas, saying that the patch was
intended to benefit humans rather than the compiler. Ilia Alshanetsky
believed {0} sufficiently clear for humans, but Alexey - who, it
should be said, doesn't spend every day fixing bugs in the PHP core - pointed
out that {NULL, 0, 0} brings the real structure dimensions
directly to the context, potentially leading to fewer header checks.
Andy pointed out that his chief aim was actually to reduce the compiler
warnings triggered by {0}; at present there are so many warnings
about the number of initializers that all are ignored. The key to long-term
maintenance was to clean up the background noise at higher levels of error
reporting. Ilia argued that most compiler warnings can safely be ignored, and
some are even bogus. Pedantic changes make the code harder to read -
particularly when it comes to large and complex structures - and he saw no
benefit in them. Andy replied that if his work wasn't useful to the PHP core
team, he might as well walk away. Ilia wrote, more gently, that the entire
team welcomed any work in the direction of improved maintainability; the
problem here was one of balance. Compilers don't always get it right; they
can generate warnings when there's nothing wrong with the code. That being
the case, it didn't make a lot of sense to code for the compiler with its
warning levels turned right up; it made more sense to code for readability.
Andy viewed the compiler as akin to a linting mechanism rather than as
something to be circumvented; he saw initialized structs as a kind of
seatbelt. Constifying and fixing signed/unsigned mismatches were on his list
too, but he was trying to get the low-hanging fruit out of the way first.
Stas commented that if a new field is added to a structure, Andy's changes
would mean having to go through all the code that mentions that structure
adding zeroes. He didn't see this as helpful, particularly given that the
compiler can implicitly do that job and the C standard explicitly provides
the means for it. He produced coding guidelines and a quotation from the GCC
manual to back up his arguments. Andy pointed out softly that having to
manually add zeroes ensures that all those errors are gone from the build.
Stas retorted that the compiler could do that zero-adding, and it could do it
with {0}. That did it for Andy; he said goodbye.
Short version: That was... quick.
BUG: array_slice parameter types
One Dirk Thomas had a question about parameter typing as it appeared in his PHP_5_3 snapshot. What exactly was the desired behaviour when:
array_slice(array $array, int $offset,
int $length);
|
was passed a non-integer length parameter? He'd found that
casting it to float resulted in an empty array rather than - as
he'd anticipated - a notice or warning. In his opinion, either a warning
should be given or the function should handle the wrongly typed parameter
gracefully. Was this behaviour specific to array_slice(), or
did all PHP functions do the same? There was a bit of a clue in Dirk's final
sentence: 'It works with 5.2.x without a problem.'
Marco Kaiser and Moritz Bechler both looked into the source and decided that
the length value wasn't being correctly cast; there should be
something resembling
convert_to_long(length_param);
|
in there somewhere. Stas demurred; he didn't see why the length
parameter was being parsed as a zval in the first place. It
should just be parsed as a long from the start.
Short version: That one just went from a one-line fix to a research job in one easy sentence.
RFC: json_encode flags
Sara Golemon made a welcome appearance on internals@ with a proposal for a
new json_encode() feature (posted three times, but what the hey,
it's Sara so we'll let her off). She explained that, while it's technically
safe to include user supplied data in json_encode() serialized
values, certain characters (the usual suspects - <,
>, & and ') left room for error. Her
patch offered an optional second parameter, options, and a set
of new flags to allow the developer to encode those characters to hex
references - not exactly a complete security solution, but 'one more rope
holding the ship together'. Since this adds five characters for each
escaped character, the default behaviour would be to leave them alone, as
now. To demonstrate usage:
echo
json_encode("<foo>");
|
If there were no objections, Sara planned to commit this change after a week.
Rasmus Lerdorf wrote that this had actually been on his TODO, 'but you are way more productive than I am as witnessed by the 3 copies of this proposal we got. ;)'
Alexey wondered what was wrong with using str_replace()? Sara
admitted that it resolves the problem, but pointed out that taking that
approach when faced with a series of nested arrays and objects 'becomes
onerous'. Stas wondered what the patch was supposed to help, given that
it didn't really offer a security solution. If security wasn't Sara's chief
concern, what was? Sara explained that certain browsers misinterpret
legitimate data, and the characters get broken out of their proper context.
Stas asked her to clarify the attack vectors. Sara couldn't; she wrote that
she didn't know of any successful vectors currently. Then again, she
would have sworn that echoing htmlentity'd data was safe, until she came
across a browser where it wasn't. The point of the patch was to give PHP
users the option to be paranoic in their approach. There were existing ways
to achieve the same thing, but they were 'hacky'; performing the
escaping inside the call seemed more appropriate to her. Stas responded that
he'd wanted to know because there needs to be some explanation provided about
the new feature, such as when to use it and what it does, and he wasn't sure
he could easily provide it.
Rasmus pointed out that attack vectors move around, and since the proposed
syntax is valid JSON it would at least give PHP users another tool.
Documenting specific attack vectors seemed of secondary importance to him. He
then documented a specific contextual attack vector against
htmlentities():
<?php $foo = htmlspecialchars($_GET['foo'], ENT_QUOTES);?>
|
'Then try hitting the page and set ?foo=';alert(0);//'
Rasmus added that this isn't a failure of the htmlentities()
function itself; it just means it was used in the wrong context. He could
imagine contexts where having a version of JSON without HTML tags and quotes
could be quite useful.
Short version: Say it once, say it twice, say it again.
TLK: Unicode support and ext/pgsql
PHP user Michael Eshom reported that PHP 6 --with-pgsql was
failing to compile under OpenSUSE 10.3 against a home build of PostgreSQL 8.3
beta 1, and attached a long list of make errors to his post. Tony
committed a patch and asked Michael to try again, explaining that he doesn't
use PostgreSQL himself and doesn't have it installed.
Michael confirmed that the fix worked, but noted that there were still a bunch of "incompatible pointer type" warning messages. Tony explained that those warnings are expected; they stem from the lack of Unicode support in the pgsql extension. Given that there hadn't been much activity from the extension maintainers lately, he wouldn't expect that to change in the near future either. Would Michael care to give them a hand? Michael replied that he wouldn't know where to start, but Tony, sensing an easy victory, regarded this as no problem; 'we can show you what to do'. Michael made a feeble attempt to save himself by writing that he hadn't done anything with C over the last 5 or 6 years, but agreed to see what he could do if Tony was willing to help him out.
Short version: If you want to avoid contributing, don't post intelligent bug reports.
TLK: Private property confusion
Marco Kaiser posted some horribly convoluted code intended to demonstrate that you can access a private class property by setting that property in an object of the same type, and then setting that object as the value of a second property. The surprising thing, to him, was that a child object loaded as a property of the parent failed to reset the other property - the one it shared with the parent - to its own value:
<?php
|
Marco wanted to know why he was able to access a private property in the first case, and wondered if there was a bug in the second? The child class should have no knowledge of the parent's private property, and he should be able to re-use the name of that property there. Johannes found his way through the maze (I tidied the example a bit) and managed a halfway coherent explanation; a) he couldn't, and b) extended classes know nothing about private members of the parent class. Etienne Kneuss added more helpfully that the accessibility check is based on the current scope, rather than on the instance. Encapsulation is conserved, as private properties are accessible only through methods of the defining class.
Jingcheng Zhang felt they'd both missed something and there was
probably a bug involved. If var_dump($child) can be used to view
a private property in the parent object, surely it should also be possible to
access that property via a public getter in the child, since by implication
the property is extended. Etienne explained that this is what
protected does; moreover, var_dump() in PHP_5_3 now
provides the name of the origin of an 'inherited' private variable:
object(ClassB)#1 (1) {
|
Short version: Object debugging is a lot easier in PHP_5_3 now.
CVS: This should've been a quiet week...
Changes in CVS that you should probably be aware of include:
- In ext/dom, bug
#43364 (recursive
xincludes don't remove internal nodes properly) was fixed in PHP_5_2, PHP_5_3 and CVS HEAD [Rob Richards] - In ext/mcrypt, bug
#43143 (Warning about empty
IVwithMCRYPT_MODE_ECB) was fixed in 5_3 and HEAD [Derick] - In ext/pcntl, bug
#43373 (
pcntl_fork()should not raiseE_ERRORon error) was fixed across all three branches [Ilia] - Core bug #43365 (Several enums have trailing commas) was fixed across all three branches [Jani Taskinen]
- Core bug #43386
(
array_globalsnot reset to0properly on init) was fixed across all three branches [Ilia] - In the PDO extension, support for the
-character in bound parameter names was reverted, making bug #43130 a "won't fix" [Ilia]
Almost half the mailing list volume this week was sparked by a three-line commit:
wez Tue Nov 27 18:53:17 2007 UTC Modified files: /CVSROOT avail loginfo Log: karma and ml for some pdo bits +# PDO Specs. CLA required to commit +unavail||pdo-specs +avail|wez,andi,stas,iliaa|pdo-specs
The responses started out short and sweet, on the whole:
'Please explain what is this 'pdo-specs' thing and why CLA is needed to commit there.' (Tony)
'One word: transparency. It is amazing how it helps to discuss things instead of acting like that.' (Pierre-Alain Joye)
'Please remove this from PHP. PHP is open source, making this closed and bringing in patents at the end makes me very angry. This has absolutely nothing to do inside PHP.' (Marcus)
but like Topsy they "growed and growed" as it dawned on the old-timers that the newcomers to php.net didn't know what this meant. A few people - Lukas Smith ('I find it amazing how oblivious to community concerns this all is...'), Rasmus Lerdorf ('Keep it out of PHP CVS and host it somewhere else.'), Marcus again ('If people want to commit they can do it in their free time') - were concerned enough to write very long and passionately angry posts, much to the bewilderment of many.
PHP user Daniel Brown was one of the bewildered many. Wasn't it possible that IBM's (we assumed) internal policies required them to have CLAs in place? Perhaps for tax and documentation purposes, or to protect their own engineers from liability? He agreed that a CLA is not in the spirit of the PHP project, but didn't understand how a document essentially offering complete freedom to redistribute the code could be 'such a Big Brother issue'. All Daniel could see as a threat was that CLA'd modules might lay the groundwork for new licensing in the future. Rasmus explained that he didn't have a problem with anyone needing a CLA for components specific to their own products; he agreed that only the final license should matter to most people in that situation. However, general PHP components - such as the whole of PDO rather than a specific PDO driver - were a different matter. Derick added that the onus was on the company to fit into the open source model rather than on the open source model to adapt to commercial needs.
Stas made the mistake of thinking this was a standard internals@ rumpus and complained about the "us vs them" attitude. Wasn't it worth considering that having good database support might be good for all concerned? Pierre, Greg Beaver and I gave warning growls. Richard Quadling, though, was easily swayed by Stas' argument, being another of the bewildered many: 'If IBM, or anyone, wants to submit code to PHP it can only improve things.' He went on to extol the virtues of peer review and the way that nothing gets into CVS without it. I explained that a CLA renders the peer review process null and void, since php.net no longer have control over that area of code; the dev team can't touch it.
Jacques Marneweck of the PHP documentation group posted an update; this particular CLA appeared to relate only to DocBook XML specifications for PDO. I pointed out that the same still applies; if a commit to that module breaks the PHP manual build, we (including the PHP Manual Editor) are prohibited from fixing it. Philip Olson, who is the PHP Manual Editor, guiltily remembered an IBM copyright notice attached to the internals documentation for PDO drivers; IBM had given permission to remove that notice some time ago, but it hadn't yet been done. He promised to make the update 'today'. Marcus asked him to refuse any copyright other than "The PHP Group" or "The PHP Documentation Group" in future, 'otherwise companies are going to own PHP piece by piece'. Hannes Magnusson explained that the copyright notice had simply been committed; there had been no notification or request. There was also no indication that those docs were covered by a CLA. He hoped not, since he'd been committing fixes to them. (They weren't; that's actually impossible.)
Dan Scott wrote a long post recommending the adoption of a legal entity that third parties could assign copyright to, rather oddly declaring that the PHP Group and PHP Documentation Group are not legal entities. ('Oddly' because MySQL AB did exactly this with ext/mysqlnd.) Derick pointed out that in all the countries that signed up to the Berne Convention copyright is protected by default, copyright notice or no. Daniel Brown backed him; the only reason to register the copyright would be if infringement were anticipated.
Rasmus wrote another long post in response to Richard's optimistic mail, explaining that the legal terminology is complex enough to effectively block anyone from signing the contract without first taking legal advice. Companies with a patent portfolio have legitimate concerns when it comes to their employees signing a CLA. Depending on the significance of the module, this could leave php.net in an untenable position. Adam Maccabee Trachtenberg wrote to back Rasmus' arguments, adding that he personally feels less inclined to work on projects with a CLA attached due to the legal hassle. Interestingly, Adam reported some success in getting his company's legal department to agree to releases of internal code under open source licenses, so they evidently weren't unfamiliar with the issues. Marcus had a similar approach; since the problem with a CLA is that there is usually an element of disclosure, the solution was to 'keep your patent stuff at home' and simply get legal permission to work on open source projects.
Stas came back for another round, but we all gave him the mailing list equivalent of hard stares and he soon went away again.
Dan Scott noticed a CVS account request from Jay Pipes asking for access to the new module. He wondered why, if this was purely an IBM thing, MySQL AB employees were signing up for it? After all, MySQL AB has a CLA in place for all third-party contributions to the MySQL code base - perhaps they were the ones pushing for CLA in PHP now? You could almost hear MySQL AB twitching at the accusation. A few hours later, Jay Pipes posted something that sounded very much like an official statement of the company's position, emphatically denying the charge. Essentially, as the owners of a dual licensed code base, they have to require their contributors to sign a CLA to avoid issues arising under the commercial license. PHP, on the other hand, has a completely different licensing and copyright background, and MySQL AB understood that the core PHP developers don't favour CLAs. That being the case, while understanding that 'CLAs are the default way for achieving clarity on IP ownership', MySQL AB saw no reason to make any changes to the current approach.
Short version: Light blue touchpaper, stand well back.
PAT: 64-bit assembler optimizations
Prior to instigating the small war in CVS, Wez Furlong picked up the mixed
patch for PDO::FETCH_KEYS and friends posted by Hans-Peter Oeri
last week.
He had a number of questions about its purpose, the allocation of an empty
string for every database connection, and the need for this in the PHP core.
Basically, Wez believed that this level of manipulation belongs firmly in
user space.
Hans-Peter wrote back defending his babies, by which time Wez had thrown his hand grenade and swiftly disappeared. Marcus in the meantime had asked Hans-Peter for tests to go with his patch, not least so that he could see what the aims of it were. Hans-Peter happily obliged.
And finally, Brian Shire offered a patch against PHP 5.2.5 bringing x86-64 assembly optimizations to zend_alloc.c. He included a simple micro benchmark to test memory allocation, and added that he'd like to get some feedback and results from others on this. He also had some 32-bit related questions for the assembler gurus on the team. I won't pretend to understand those; see the patch link for details.
Short version: It all sounds great, excepting the PHP 5.2.5 part...

Comments