Zend Weekly Summaries Issue #369

January 29, 2008

Uncategorized

BUG: syslog segfault
TLK: Reference cycle collector patch
RFC: Late binding for parent (and other options)
TLK: Struct initializations
BUG: array_slice parameter types
RFC: json_encode flags
TLK: Unicode support and ext/pgsql
TLK: Private property confusion
CVS: This should’ve been a quiet week…
PAT: 64-bit assembler optimizations

25th November – 1st December 2007


BUG: syslog segfault

Nir Rachmel, who runs PHP 5.2.3 as a statically compiled module in an
embedded server (appWeb), reported that his webserver crashes when printing
multiple syslogs to the system. He believed he’d traced the source of the trouble
via a backtrace to a free() call in openlog(), but
wasn’t sure how to resolve the problem. Had anyone else run into it?

There was a core backtrace included in Nir’s post, and Tony Dovgal asked him
to augment it with a printout of the basic_globals contents:


(gdb) p basic_globals

Nir soon found there was no such symbol in the context of any of the frames
he’d tried; Tony wrote mournfully that this meant it was a multi-threaded
version, which would make things even more complicated. Nir posted the
contents of tsrm_ls instead. Tony could only see from this that
BG(syslog_device) had somehow been free‘d without
being NULL‘d – which shouldn’t happen until shutdown.

Short version: A bad thing happened.


TLK: Reference cycle collector patch

Johannes Schlüter wrote asking for an update on the Zend team’s
benchmarking of David Wang’s reference cycle collector patch, but had no
response. Sebastian Bergmann suggested Johannes simply commit the garbage
collector to PHP_5_3 now and disable it by default, to make testing and
benchmarking easier for all interested parties. Derick Rethans, missing the
mischievous ‘now‘ part, replied bemusedly that this was the plan.

Short version: Too long no talk.


RFC: Late binding for parent (and other options)

Following on from the discussion href="http://devzone.zend.com/article/2798#Heading2">last week
about the behaviour of parent::, Mike Lively came up with three
different patches against PHP_5_3 to work around the issue, as he saw it, of
parent:: being considered an explicit class reference‘. The patches
had different levels of intrusiveness, but all three passed the existing LSB tests.

First up was ‘the parent-forwarding patch‘, which modifies the
behaviour of parent:: so that it no longer alters the
called_class value stored in the executor globals.

Second came ‘the new keyword patch‘, which has the same effect as the
parent-forwarding patch but does it by introducing a new keyword. Mike had no
idea what this keyword should be, so for the purposes of demonstration named
it __parentf:


class
A {
    public static function
style="color: #0000BB">test
() { />        echo style="color: #0000BB">get_called_class(); // output: B />    }
} />
class
B style="color: #007700">extends A
{
    public
static function
test style="color: #007700">() { />        __parentf:: style="color: #0000BB">test(); />    }
}


This patch was relatively high-cost in terms of performance, and Mike hoped
someone might know a cheaper way to achieve the same result.

Finally, there was ‘the forward_static_call() patch‘. This approach
introduces two new functions, forward_static_call() and
forward_static_array(), which are similar in nature to
call_user_func*() except that the first parameter specifies the
target method in the parent class, rather than a callback:


class B style="color: #007700">extends
A
{
    public
static function
test style="color: #007700">() { />        forward_static_call( style="color: #0000BB">__FUNCTION__); />    }
}


This patch, being completely outside the Zend Engine, didn’t impact the
existing LSB functionality in any way. Mike would be prepared to work further
on any of these patches that seemed worth pursuing; he asked for feedback.

Alexey Zakhlestin ventured that he thought the first patch the most natural
and intuitive of the three, which brought the wrath of Stas Malyshev upon his
head. Stas didn’t see how making parent::foo() and
A::foo() work differently was either “natural” or “intuitive”.
Alexey argued that A::foo() had nothing to do with inheritance,
whereas parent::foo() plainly did; keeping the context made
perfect sense to him. Elizabeth Smith agreed; ‘I shudder to think of the
number of “bogus” bug reports that’ll pop up if LSB doesn’t work this
way’
. She believed classname::foo should mean
something other than parent::foo() when it comes to statics.
David Zülke also took this view, but noted that the opposition was
stronger. He suggested that static should always point to the original
callee; get_called_class() should reflect the end of the call
chain, in line with Stas’ argument; and a new function,
get_static_class(), should reflect the start of the call chain.
Stas’ response to that idea was ‘Ouch‘. Not only would
parent:: and classname:: behave differently, but
there would be two separate functions for getting the called class, and one
of them with non-functional notation. Besides, if
get_called_class() was only supposed to return the name of the
parent class, why not use __CLASS__? Mike reminded him that
get_class() also exists. David dropped his idea, and went back
to the original notion – that get_called_class() should maintain
the context of the original callee.

Short version: Stas 1, rest of the world 2.


TLK: Struct initializations

Andy Lester, true to his word, had been making PHP build with more stringent
compiler error checking – specifically, -Wextra under GCC. He
produced a patch to clean up ‘dozens of struct initializations‘ which,
while being valid C code, could hide future problems due to the compiler
output triggered by them. Stas queried this – didn’t the C standard mandate
filling uninitialized fields with 0? – but Marcus Börger
saw it as ‘more correct code‘ and voted that Andy’s patch be applied
in PHP_5_3 and CVS HEAD. Andy responded to Stas, saying that the patch was
intended to benefit humans rather than the compiler. Ilia Alshanetsky
believed {0} sufficiently clear for humans, but Alexey – who, it
should be said, doesn’t spend every day fixing bugs in the PHP core – pointed
out that {NULL, 0, 0} brings the real structure dimensions
directly to the context, potentially leading to fewer header checks.

Andy pointed out that his chief aim was actually to reduce the compiler
warnings triggered by {0}; at present there are so many warnings
about the number of initializers that all are ignored. The key to long-term
maintenance was to clean up the background noise at higher levels of error
reporting. Ilia argued that most compiler warnings can safely be ignored, and
some are even bogus. Pedantic changes make the code harder to read -
particularly when it comes to large and complex structures – and he saw no
benefit in them. Andy replied that if his work wasn’t useful to the PHP core
team, he might as well walk away. Ilia wrote, more gently, that the entire
team welcomed any work in the direction of improved maintainability; the
problem here was one of balance. Compilers don’t always get it right; they
can generate warnings when there’s nothing wrong with the code. That being
the case, it didn’t make a lot of sense to code for the compiler with its
warning levels turned right up; it made more sense to code for readability.
Andy viewed the compiler as akin to a linting mechanism rather than as
something to be circumvented; he saw initialized structs as a kind of
seatbelt. Constifying and fixing signed/unsigned mismatches were on his list
too, but he was trying to get the low-hanging fruit out of the way first.

Stas commented that if a new field is added to a structure, Andy’s changes
would mean having to go through all the code that mentions that structure
adding zeroes. He didn’t see this as helpful, particularly given that the
compiler can implicitly do that job and the C standard explicitly provides
the means for it. He produced coding guidelines and a quotation from the GCC
manual to back up his arguments. Andy pointed out softly that having to
manually add zeroes ensures that all those errors are gone from the build.
Stas retorted that the compiler could do that zero-adding, and it could do it
with {0}. That did it for Andy; he said goodbye.

Short version: That was… quick.


BUG: array_slice parameter types

One Dirk Thomas had a question about parameter typing as it appeared in his
PHP_5_3 snapshot. What exactly was the desired behaviour when:


href="/manual/function.array-slice.html">array_slice style="color: #007700">(array
$array, style="color: #0000BB">int $offset,
int $length);


was passed a non-integer length parameter? He’d found that
casting it to float resulted in an empty array rather than – as
he’d anticipated – a notice or warning. In his opinion, either a warning
should be given or the function should handle the wrongly typed parameter
gracefully. Was this behaviour specific to array_slice(), or
did all PHP functions do the same? There was a bit of a clue in Dirk’s final
sentence: ‘It works with 5.2.x without a problem.

Marco Kaiser and Moritz Bechler both looked into the source and decided that
the length value wasn’t being correctly cast; there should be
something resembling


convert_to_long(length_param);

in there somewhere. Stas demurred; he didn’t see why the length
parameter was being parsed as a zval in the first place. It
should just be parsed as a long from the start.

Short version: That one just went from a one-line fix to a research job in one easy sentence.


RFC: json_encode flags

Sara Golemon made a welcome appearance on internals@ with a proposal for a
new json_encode() feature (posted three times, but what the hey,
it’s Sara so we’ll let her off). She explained that, while it’s technically
safe to include user supplied data in json_encode() serialized
values, certain characters (the usual suspects – <,
>, & and ') left room for error. Her
patch offered an optional second parameter, options, and a set
of new flags to allow the developer to encode those characters to hex
references – not exactly a complete security solution, but ‘one more rope
holding the ship together
‘. Since this adds five characters for each
escaped character, the default behaviour would be to leave them alone, as
now. To demonstrate usage:


echo
json_encode("<foo>" style="color: #007700">);
"<foo>" />
echo json_encode( style="color: #DD0000">"<foo>",
JSON_HEX_TAG);
"\u003Cfoo\u003E" />
echo json_encode( style="color: #DD0000">"<foo bar='baz'>",
JSON_HEX_TAG | JSON_HEX_APOS style="color: #007700">);
"\u003Cfoo
bar=\u0027baz\u0027\u003E"


If there were no objections, Sara planned to commit this change after a week.

Rasmus Lerdorf wrote that this had actually been on his TODO, ‘but you are
way more productive than I am as witnessed by the 3 copies of this proposal we
got. ;)

Alexey wondered what was wrong with using str_replace()? Sara
admitted that it resolves the problem, but pointed out that taking that
approach when faced with a series of nested arrays and objects ‘becomes
onerous
‘. Stas wondered what the patch was supposed to help, given that
it didn’t really offer a security solution. If security wasn’t Sara’s chief
concern, what was? Sara explained that certain browsers misinterpret
legitimate data, and the characters get broken out of their proper context.
Stas asked her to clarify the attack vectors. Sara couldn’t; she wrote that
she didn’t know of any successful vectors currently. Then again, she
would have sworn that echoing htmlentity’d data was safe, until she came
across a browser where it wasn’t. The point of the patch was to give PHP
users the option to be paranoic in their approach. There were existing ways
to achieve the same thing, but they were ‘hacky‘; performing the
escaping inside the call seemed more appropriate to her. Stas responded that
he’d wanted to know because there needs to be some explanation provided about
the new feature, such as when to use it and what it does, and he wasn’t sure
he could easily provide it.

Rasmus pointed out that attack vectors move around, and since the proposed
syntax is valid JSON it would at least give PHP users another tool.
Documenting specific attack vectors seemed of secondary importance to him. He
then documented a specific contextual attack vector against
htmlentities():


<?php $foo = href="/manual/function.htmlspecialchars.html">htmlspecialchars style="color: #007700">(
$_GET style="color: #007700">['foo' style="color: #007700">], ENT_QUOTES); style="color: #0000BB">?>
<a href="" onmouseover="a=' style="color: #0000BB"><?php echo
$foo?>';">Mouse Over Me</a> />

Then try hitting the page and set ?foo=';alert(0);//

Rasmus added that this isn’t a failure of the htmlentities()
function itself; it just means it was used in the wrong context. He could
imagine contexts where having a version of JSON without HTML tags and quotes
could be quite useful.

Short version: Say it once, say it twice, say it again.


TLK: Unicode support and ext/pgsql

PHP user Michael Eshom reported that PHP 6 --with-pgsql was
failing to compile under OpenSUSE 10.3 against a home build of PostgreSQL 8.3
beta 1, and attached a long list of make errors to his post. Tony
committed a patch and asked Michael to try again, explaining that he doesn’t
use PostgreSQL himself and doesn’t have it installed.

Michael confirmed that the fix worked, but noted that there were still a
bunch of “incompatible pointer type” warning messages. Tony explained that
those warnings are expected; they stem from the lack of Unicode support in
the pgsql extension. Given that there hadn’t been much activity from
the extension maintainers lately, he wouldn’t expect that to change in the
near future either. Would Michael care to give them a hand? Michael replied
that he wouldn’t know where to start, but Tony, sensing an easy victory,
regarded this as no problem; ‘we can show you what to do‘. Michael
made a feeble attempt to save himself by writing that he hadn’t done anything
with C over the last 5 or 6 years, but agreed to see what he could do if Tony
was willing to help him out.

Short version: If you want to avoid contributing, don’t post intelligent bug reports.


TLK: Private property confusion

Marco Kaiser posted some
horribly convoluted code
intended to demonstrate that you can access a
private class property by setting that property in an object of the same type,
and then setting that object as the value of a second property. The surprising
thing, to him, was that a child object loaded as a property of the parent failed
to reset the other property – the one it shared with the parent – to its own value:


<?php

class MyClass style="color: #007700">{
    protected
$_object = null;
    private
style="color: #0000BB">$_value =
50;

    // public getters and setters
}

class ChildClass extends style="color: #0000BB">MyClass { />    private $_value
= 100;
}
/>
$class = new MyClass style="color: #007700">();
$child = new style="color: #0000BB">ChildClass(); />
$class->setObject style="color: #007700">($child);

echo $class->getValue style="color: #007700">(); // 50
echo $child-> style="color: #0000BB">getValue();
// 50

style="color: #0000BB">?>

Marco wanted to know why he was able to access a private property in the
first case, and wondered if there was a bug in the second? The child class
should have no knowledge of the parent’s private property, and he
should be able to re-use the name of that property there. Johannes
found his way through the maze (I tidied the example a bit) and managed a
halfway coherent explanation; a) he couldn’t, and b) extended classes know
nothing about private members of the parent class. Etienne Kneuss
added more helpfully that the accessibility check is based on the current
scope, rather than on the instance. Encapsulation is conserved, as private
properties are accessible only through methods of the defining class.

Jingcheng Zhang felt they’d both missed something and there was
probably a bug involved. If var_dump($child) can be used to view
a private property in the parent object, surely it should also be possible to
access that property via a public getter in the child, since by implication
the property is extended. Etienne explained that this is what
protected does; moreover, var_dump() in PHP_5_3 now
provides the name of the origin of an ‘inherited’ private variable:


object(ClassB)#1 (1) {
    ["a":"ClassA":private]=>
    int(2)
}

Short version: Object debugging is a lot easier in PHP_5_3 now.


CVS: This should’ve been a quiet week…

Changes in CVS that you should probably be aware of include:

  • In ext/dom, bug
    #43364
    (recursive xincludes don’t remove internal
    nodes properly) was fixed in PHP_5_2, PHP_5_3 and CVS HEAD [Rob Richards]
  • In ext/mcrypt, bug
    #43143
    (Warning about empty IV with
    MCRYPT_MODE_ECB) was fixed in 5_3 and HEAD [Derick]
  • In ext/pcntl, bug
    #43373
    (pcntl_fork() should not raise
    E_ERROR on error) was fixed across all three branches [Ilia]
  • Core bug #43365
    (Several enums have trailing commas) was fixed across all three
    branches [Jani Taskinen]
  • Core bug #43386
    (array_globals not reset to 0 properly on
    init) was fixed across all three branches [Ilia]
  • In the PDO extension, support for the -
    character in bound parameter names was reverted, making href="http://bugs.php.net/43130">bug #43130 a “won’t fix” [Ilia]

Almost half the mailing list volume this week was sparked by a three-line commit:

wez Tue Nov 27 18:53:17 2007 UTC
Modified files:
/CVSROOT avail loginfo
Log: karma and ml for some pdo bits

+# PDO Specs. CLA required to commit
+unavail||pdo-specs
+avail|wez,andi,stas,iliaa|pdo-specs

The responses started out short and sweet, on the whole:

Please explain what is this ‘pdo-specs’ thing and why CLA is needed to
commit there.
‘ (Tony)

One word: transparency. It is amazing how it helps to discuss things
instead of acting like that.
‘ (Pierre-Alain Joye)

Please remove this from PHP. PHP is open source, making this closed and
bringing in patents at the end makes me very angry. This has absolutely
nothing to do inside PHP.
‘ (Marcus)

but like Topsy they “growed and growed” as it dawned on the old-timers that
the newcomers to php.net didn’t know what this meant. A few people – href="http://news.php.net/php.internals/33482">Lukas Smith (‘I find
it amazing how oblivious to community concerns this all is…
‘),
Rasmus Lerdorf
(‘Keep it out of PHP CVS and host it somewhere else.‘),
Marcus again
(‘If people want to commit they can do it in their free time‘)
- were concerned enough to write very long and passionately angry posts,
much to the bewilderment of many.

PHP user Daniel Brown was one of the bewildered many. Wasn’t it possible that
IBM’s (we assumed) internal policies required them to have CLAs in
place? Perhaps for tax and documentation purposes, or to protect their own
engineers from liability? He agreed that a CLA is not in the spirit of the
PHP project, but didn’t understand how a document essentially offering complete
freedom to redistribute the code could be ‘such a Big Brother issue‘. All
Daniel could see as a threat was that CLA’d modules might lay the groundwork for
new licensing in the future. Rasmus explained that he didn’t have a problem
with anyone needing a CLA for components specific to their own products; he
agreed that only the final license should matter to most people in that
situation. However, general PHP components – such as the whole of PDO rather
than a specific PDO driver – were a different matter. Derick added that the
onus was on the company to fit into the open source model rather than on the
open source model to adapt to commercial needs.

Stas made the mistake of thinking this was a standard internals@ rumpus and
complained about the “us vs them” attitude. Wasn’t it worth considering that
having good database support might be good for all concerned? Pierre, href="http://news.php.net/php.internals/33538">Greg Beaver and I gave
warning growls. Richard Quadling, though, was easily swayed by Stas’ argument,
being another of the bewildered many: ‘If IBM, or anyone, wants to submit
code to PHP it can only improve things.
‘ He went on to extol the virtues
of peer review and the way that nothing gets into CVS without it. I explained
that a CLA renders the peer review process null and void, since php.net no
longer have control over that area of code; the dev team can’t touch it.

Jacques Marneweck of the PHP documentation group posted an update; this
particular CLA appeared to relate only to DocBook XML specifications for PDO.
I pointed out that the same still applies; if a commit to that module breaks
the PHP manual build, we (including the PHP Manual Editor) are prohibited
from fixing it. Philip Olson, who is the PHP Manual Editor, guiltily
remembered an IBM copyright notice attached to the href="http://php.net/manual/en/internals2.pdo.php">internals
documentation for PDO drivers; IBM had given permission to remove that
notice some time ago, but it hadn’t yet been done. He promised to make the update
today‘. Marcus asked him to refuse any copyright other than “The PHP
Group” or “The PHP Documentation Group” in future, ‘otherwise companies
are going to own PHP piece by piece
‘. Hannes Magnusson explained that the
copyright notice had simply been committed; there had been no notification or
request. There was also no indication that those docs were covered by a CLA.
He hoped not, since he’d been committing fixes to them. (They weren’t; that’s
actually impossible.)

Dan Scott wrote a long post
recommending the adoption of a legal entity that third parties could assign
copyright to, rather oddly declaring that the PHP Group and PHP Documentation
Group are not legal entities. (‘Oddly’ because MySQL AB did exactly this with
ext/mysqlnd.) Derick pointed out that in all the countries that signed
up to the href="http://en.wikipedia.org/wiki/Berne_Convention_for_the_Protection_of_Literary_and_Artistic_Works">Berne
Convention copyright is protected by default, copyright notice or no.
Daniel Brown backed him; the only href="http://www.copyright.gov/help/faq/faq-general.html#mywork">reason to
register the copyright would be if infringement were anticipated.

Rasmus wrote another
long post
in response to Richard’s optimistic mail, explaining that
the legal terminology is complex enough to effectively block anyone from
signing the contract without first taking legal advice. Companies with a
patent portfolio have legitimate concerns when it comes to their employees
signing a CLA. Depending on the significance of the module, this could leave
php.net in an untenable position. Adam Maccabee Trachtenberg wrote to back
Rasmus’ arguments, adding that he personally feels less inclined to work on
projects with a CLA attached due to the legal hassle. Interestingly, Adam
reported some success in getting his company’s legal department to agree to
releases of internal code under open source licenses, so they evidently weren’t
unfamiliar with the issues. Marcus had a similar approach; since the problem
with a CLA is that there is usually an element of disclosure, the solution was
to ‘keep your patent stuff at home‘ and simply get legal permission to
work on open source projects.

Stas came back for another round, but we all gave him the mailing list
equivalent of hard stares and he soon went away again.

Dan Scott noticed a CVS account request from Jay Pipes asking for access to
the new module. He wondered why, if this was purely an IBM thing, MySQL AB
employees were signing up for it? After all, MySQL AB href="http://forge.mysql.com/wiki/MySQL_Contributor_License_Agreement">has a
CLA in place for all third-party contributions to the MySQL code base -
perhaps they were the ones pushing for CLA in PHP now? You could almost hear
MySQL AB twitching at the accusation. A few hours later, Jay Pipes posted
something that sounded very
much like an official statement
of the company’s position, emphatically
denying the charge. Essentially, as the owners of a dual licensed code base,
they have to require their contributors to sign a CLA to avoid issues
arising under the commercial license. PHP, on the other hand, has a completely
different licensing and copyright background, and MySQL AB understood that the
core PHP developers don’t favour CLAs. That being the case, while understanding
that ‘CLAs are the default way for achieving clarity on IP ownership‘,
MySQL AB saw no reason to make any changes to the current approach.

Short version: Light blue touchpaper, stand well back.


PAT: 64-bit assembler optimizations

Prior to instigating the small war in CVS, Wez Furlong picked up the mixed
patch for PDO::FETCH_KEYS and friends posted by Hans-Peter Oeri
last week.
He had a number of questions about its purpose, the allocation of an empty
string for every database connection, and the need for this in the PHP core.
Basically, Wez believed that this level of manipulation belongs firmly in
user space.

Hans-Peter wrote back defending his babies, by which time Wez had thrown
his hand grenade and swiftly disappeared. Marcus in the meantime had asked
Hans-Peter for tests to go with his patch, not least so that he could see
what the aims of it were. Hans-Peter happily href="http://news.php.net/php.internals/33468">obliged.

And finally, Brian Shire offered
a patch against PHP
5.2.5 bringing x86-64 assembly optimizations to zend_alloc.c. He
included a simple micro benchmark to test memory allocation, and added
that he’d like to get some feedback and results from others on this. He
also had some 32-bit related questions for the assembler gurus on the team.
I won’t pretend to understand those; see the patch link for details.

Short version: It all sounds great, excepting the PHP 5.2.5 part…

Comments are closed.