Zend Weekly Summaries Issue #325

      Comments Off on Zend Weekly Summaries Issue #325

TLK: Legal quagmire
TLK: Globals as CV and friends [continued]
CfP: OWASP Spring of Code 2007
TLK: POSIX locales and PHP 6
NEW: PHP 5.2.1 RC4
TLK: A doccer writes
CVS: Realpath fixes
PAT: New stuff in the cupboard

TLK: Legal quagmire

Almost the entire PHP development team turned into lawyers this week as the
discussion over registered trademarks (too much like hard work), the extent of the
legal protection offered to the name “PHP”, and the discrimination (or otherwise)
inherent in the need to ask permission before using “PHP” in a project name. Stefan
Esser himself had a peculiarly European view of the word ‘discrimination’; as Stas
Malyshev wrote, ‘Too bad we don’t speak Latin here. Otherwise any distinguishing
between anything – like knowing left from right and good from bad – would be called
“discrimination”
‘. Several people pointed out to Stefan that the PHP License is
OSI
approved
, and therefore cannot violate their rules. William A. Rowe
startled everyone by claiming that Stefan was correct in his assessment of
discrimination, but it turned out to be a ploy. He was actually saying that the
grant of permission to use the PHP ‘Mark’ is outside the OSI remit, which simply
covers the licensing for the code itself. In case this wasn’t clear enough, William
added ‘None of the OSI licenses grant you permission to reuse their Mark for your
purpose…
‘.

Stefan continued to accuse the PHP Group of abusing the license in order to
advertise their own companies. Rasmus Lerdorf responded with an assurance that no
right to the PHP ‘Mark’ has ever been granted to any company; ‘the only projects
we have granted the right to use the brand are the ones hosted on our CVS server and
developed by what we consider ‘us’
‘. He added that the PHP Security Consortium
have also never been granted such a right, and have in fact been asked to rectify
the situation following the fuss on the internals list last week.

Short version: All gone. You can open your eyes now.

TLK: Globals as CV and friends [continued]

Dmitry Stogov looked into Sara Golemon’s efforts to close the ‘localized silence’
loophole, and was less than enamoured by her patch. It didn’t make sense to him to
optimize


@$a['b']

but not


@$a->b

or


@$a->b()

and he wrote that he’d prefer to find a more general solution to the problem.

Sara explained that, although it would be possible to put her change later in the
proceedings – thereby covering all cases – having it earlier meant there didn’t need
to be an extra op value and temporary variable, reducing the performance hit. She’d
noticed that a similar solution was already in place in
do_fetch_property(), where $this->prop is turned into
an immediate FETCH_OBJ, and had now updated her patch to apply some of
the concepts there to the FETCH_(DIM|OBJ) compiled variable
implementations. With that out of the way, Sara offered an alternative approach; the
rewriting step could be expressed as a macro or inline function, allowing the
DIM and OBJ parsers to share the same code. She also
suggested supporting op1->IS_UNUSED for $this – already
part of the do_fetch_property() solution – when $this is
used with array access on objects, but added that enabling the opcode handler to
support it would require further changes in the Zend Engine.

Dmitry agreed that her updated patch would work, but reiterated that he would
prefer a more elegant solution. However, he didn’t have time to spend on it at
present; could she sit on the patch and simply commit it next week if he didn’t get
back to her in the meantime? This provoked a squawk from Andrei Zmievski; the whole
point of the exercise had been to find a quick solution for the HTTP request
decoding work in PHP 6, the preview release of which is now a month behind schedule.
Sara posted a long
but soothing message
summarizing – for Dmitry’s benefit – exactly ‘what’s
being done, how, and why
‘. She explained about the need for HTTP input encoding
detection, which led to the need for fine-grained JIT runtime that will catch
fetches during FETCH_DIM/FETCH_OBJ execution handlers,
which led to the problem of discerning a fetch type in autoglobals, which has now
been fixed in most cases by making autoglobals compiled variables. The exceptional
case is that of the silence operator, which forces non-CV; and Sara’s current patch
was intended to eliminate that exceptional case.

Dmitry thanked Sara for her detailed description, admitting that the end goal had
been hidden from him until now. That said, he had concerns over the idea of the
FETCH_DIM/FETCH_OBJ change; it would be likely to slow
down each array operation; and he still didn’t understand the problem over the
silence operator forcing non-CV. Could she possibly post the entire set of changes
as one patch? Sara obliged, altering the thread title as she went. She thought it
was self-explanatory, but mentioned that there was an aspect of it she didn’t expect
Dmitry to like. She was right, he didn’t. He wrote that the patch added ‘sloppily
built functionality
‘ into the core and would slow every dimension or property
fetch, both at runtime and at compile-time; this seemed to him a high price to pay
for reliable $_GET/$_POST encoding conversion. Dmitry also
thought that adding a zend_auto_global pointer to
zend_compiled_variable would cause problems for opcode caches; an
indirect index pointer would be safer. All in all, he would like to see the CV patch
already in HEAD rolled back pending discussion, and the entire patch committed
together (or not) following consensus. He was curious to know why Sara had rejected
autoglobals as overloaded arrays? Pierre-Alain Joye answered for her; that
approach doesn’t work in all cases, and would entail BC breakage even if it did.

Sara, smarting a little over the ‘sloppily built functionality’ part, pointed out
that the slowdown in question would amount to ‘a vector lookup and a couple
integer compares
‘ for disarmed callbacks. The additional work for armed
callbacks should be deferred, she wrote, pointing to the history of the entire thread to illustrate why. She
recognized, however, that Dmitry was correct about the impact on opcode caches, and
that this was a problem that would need fixing. Dmitry, realizing he’d offended
Sara, explained apologetically that he relies heavily on his dictionary when writing
in English. All he’d intended to say was that her approach seemed over-complicated
to him. Perhaps the team should stop thinking about JIT autoglobals altogether, and
use simple functions like http_get_var($name) instead? Pierre eagerly
agreed; he wrote that his initial proposal had been for a simple JIT for GPC at
runtime, to be used alongside ext/filter. He also liked the idea of a CGI
object… like in perl and python… this was not the first time Pierre had
mentioned this. Andrei, however, was unimpressed by the idea: ‘Good luck trying
to retrain millions of programmers to use a CGI object or a function to retrieve GPC
values
‘. He didn’t see how Sara’s patch would cost so much in terms of
performance, compared with other aspects of PHP 6. Pierre explained that he was more
concerned about complexity than performance issues at this stage, and added that
he didn’t understand the reluctance to commit experimental code into CVS HEAD
when all that’s needed right now is a working solution. Dmitry backed Pierre’s views
over complexity, but pointed out that he isn’t personally able to restrict commits –
he can only give his opinion. If most of the team wanted the patch to go in, it
could go in, assuming it was fixed for opcode caches. Pierre didn’t want Sara’s
solution on a permanent basis – just as a stopgap solution – which was not a popular
view.

Meanwhile Andi Gutmans had taken some time out of work to catch up with the
internals mail. He started out with a critique of the silencer patch – basically
making the same error Dmitry had, of taking it out of context – before finding
Sara’s explanation of the chain of events that had led up to it. He wrote that it
might actually be worth considering requiring input variables to be fetched through
a new API and/or object‘, assuming that the default behaviour isn’t good
enough; it seems a request object is high on the wishlist of those working on the
Zend Framework. Pierre replied that ext/filter provides everything necessary
to implement this, apart from an object interface, but he promised a proposal and
patch to provide that ‘as soon as possible‘. Sara changed the subject line
again. She wrote that she also preferred the simplicity of implementation that
request objects would offer; she personally didn’t see their introduction as a
show-stopping BC break. She offered up some pseudo code to demonstrate her
understanding of the approach:


class
PHPGetObject implements ArrayAccess {

    private $decoded = array();

    public function __offset_get($varname) {
        if (!isset(
$this->decoded[$varname])) {
            

$val
= http_decode_get($varname);
            

$this
->decoded[$varname] =
$val;
        }
        return
$this->decoded[$varname];
    }
    
/* plus
set,isset,unset of course */
    /* Probably need an iterator too
*/
}

On the plus side, this would be ‘fast, (mostly) clean, and cheap‘. On the
minus side, it would break the following behaviours:

Andi thanked Sara for offering some insight into the impact of
implementing request objects, and asked for a few days’ grace to look into the idea.
Pierre noted that this wasn’t actually what he had had in mind; he hadn’t even
viewed ArrayAccess as an option, due to the BC breakage that would come
with it. However, he concluded, assuming the problems could be overcome at the
Engine level he would be willing to accept Sara’s approach.

Short version: It wasn’t four threads, it just looked that way. Oh, and
request objects seem likely now.

CfP: OWASP Spring of Code 2007

Someone named Andrew van der Stock sent out ‘a tentative call for PHP security
proposals
‘ mid-week.

It seems that OWASP (Open Web
Application Security Project) recently concluded their Autumn of
Code 2006
, which Andrew reported as having had some success in getting certain
security projects completed. The group are now keen to launch their Spring of Code
2007, which will involve their funding 10-15 projects with (minimum) $80,000 USD
between them. They are currently aiming to increase both the number of projects and
the available funding through corporate sponsorship.

The purpose of his post was to invite ‘those active in PHP security‘ to
consider working on ways to improve the security situation for all PHP applications,
regardless of the skill or knowledge of the application developer. Potential areas
for investment include, but are not limited to, plugging known Web application
security gaps in existing Open Source projects, or researching the root causes of
‘typical’ PHP application security issues and developing a means of combating them.
OWASP aim to fund projects with concrete deliverables over a 3-4 month timeframe,
starting in early Spring. Their only restriction is that the work – and any
documentation – must be released under an Open Source license. OWASP can
provide incubator space on request, and will provide access to mentors specialising
in Web security issues.

They will be accepting project submissions from ‘some time in February
for around four weeks; would-be corporate sponsors should contact them directly.

Short version: Not so much a Call for Papers as a Call for Projects…

TLK: POSIX locales and PHP 6

PHP user Tomas Kuliavas had been testing the third release candidate for PHP
5.2.1, and wrote to internals@ with the news that the fix for bug #39993 had introduced
LC_CTYPE variable corruption. Tomas had been bemused to find comments
in the bug report stating that the fix was incorrect, and wanted to know how it had
reached CVS. He also wondered how difficult it could possibly be to replace a call
to strcasecmp() with a locale-independent, case-sensitive comparison in
C?

Tony Dovgal asked for more details, mentioning in passing that Tomas was welcome
to post a patch if he had a better solution. Tomas obliged with the test script from
the original bug report, along with the test results for RC3 and an overview of the
behaviour across three PHP versions. He explained that setting LC_CTYPE
to C could currently break any gettext translations not using
bind_textdomain_codeset(). Still incredulous over the fix, Tomas also
pointed out that, in PHP, locale-insensitive strtolower() or
strtoupper() is a matter of a single preg_replace() call –
and if that wasn’t an option in C, perhaps Tony should consider adding
0x20 to all 0x410x5A values and using
strcmp() rather than strcasecmp(), given that timezone
identifiers are in plain US-ASCII. Tony, having fixed the bug, retorted that
making PCRE a requirement for ext/date is not an option‘. The alternative
offered by Tomas seemed over-complicated, when compared with the existing solution
of switching locale every time a string comparison is needed. Tomas argued that PHP
needs a locale-insensitive strcasecmp(), otherwise ‘your
developers will continue hitting string comparison issues in Turkish and
Azerbaijani
‘. It might be patched in timezone parsing, but it was bound to break
elsewhere; he saw the existing solution as a hack rather than a bug fix. Derick
Rethans swiftly agreed with him, but explained that the patch had been accepted
because it’s the quickest way to resolve the problem for now. Tony also agreed that
it was a hack, but ‘it fixes the problem and does not introduce any new issues
(any more)
‘. He reiterated that Tomas was welcome to share and discuss any
better ideas on internals@.

Ilia Alshanetsky, who was responsible for the dubious fix, wrote simply that the
hack works. It only affects a little-used behaviour, and won’t be relevant anyway in
PHP 6 because POSIX locales will not be used there. Nuno Lopes disagreed, and wrote
that setlocale() will be needed for external libraries in PHP 6 –
including the bundled PCRE library. Ilia refuted this, claiming that PHP would do
all the user-locale handling internally. Nuno wanted to know if that meant PHP would
magically issue setlocale() calls during, say, a call to
preg_match() in Portuguese? Ilia replied that PCRE should operate in
UTF-8 mode; Pierre wondered how that would help to make it locale-compliant? Andrei
explained that, with UTF-8, PCRE already ‘knows’ the relationships between the
Portuguese characters, and doesn’t need to rely on POSIX locales. Nuno wanted to
know if there’d be a way to match only Portuguese letters – he rather
suspected there wouldn’t be, without his POSIX locale. He didn’t feel that forcing
preg_anything() to UTF-8 mode was a good idea, for this reason, but
admitted that he hadn’t explored PHP 6 in great enough depth to really be
certain.

Nuno’s comments set Andrei off on a different tack – the problem of not
forcing UTF-8 mode in PCRE for PHP 6. He wrote that it isn’t that simple;
preg_replace() allows array arguments, which can contain
mixed IS_UNICODE and IS_STRING values. Andrei went on to
explain that PCRE in UTF-8 mode doesn’t care about POSIX locales, even in PHP 5, but
added that he believes the ICU regexp extension will actually allow
Portuguese characters in UTF-8 strings to be matched. It just hasn’t been
implemented yet.

Pierre was with Nuno over this; he wanted to know how UTF-8 could know about such
things as words, whitespace and other locale-specific issues? Perhaps this was a job
for the ICU library, as Andrei had already suggested. Andrei wrote something blunt
about the difference between an encoding (UTF-8) and full locale information
(CLDR).

Nuno wondered whether it wouldn’t be possible to have PCRE simply reject mixed
strings? He thought the ICU regexp API would probably be slower than the
PCRE/locale combination, thanks to its use of Unicode property table lookups. Nuno
went on to stun us all by explaining exactly how PCRE supports
Unicode character properties
.

Short version: You really don’t want to know what happens inside PCRE.

NEW: PHP 5.2.1 RC4

It fell to Ilia, as Release Master for the PHP 5.2 series, to announce the
availability of PHP 5.2.1 RC4 for testing:

Edin Kadribasic followed up with a long list of the win32 binary
goodies available at http://downloads.php.net/edink/, including ZTS builds of the core
distro
, PECL bundle, debug pack and Windows installer, and NTS (non-thread-safe) builds of the core, PECL and debug bundles for CGI/FastCGI and CLI only.

Uwe Schindler was quick off the mark. He wrote that RC4 ‘works great
except that, under Solaris 9, phpinfo() only displays the first
registered PHP stream – regardless of the number of streams actually available. He’d
noticed the same bug in RC3, but had left it alone because he assumed it was related
to the problem described in an existing bug report. Scott MacVicar confirmed that he was also
seeing the bug under Fedora. Tony promptly fixed the bug, thanked them both for
highlighting it, and apologized in the same breath – it had been caused by his own
coding error.

Short version: … and that’s why we have release candidates.

TLK: A doccer writes

PHP Documentation Group member Mehdi Achour wrote to internals@ to complain about
the inadequacies of the system for handling updates in the PHP Manual. Although he
personally follows the CVS commits, bug reports, changelog and the manual’s own user
notes, he still felt he was missing a lot of changes in PHP. Worse, after spending a
year away from the project, Mehdi found no clues as to what had been added, when it
had been added, or whether it had been documented. He proposed a simple addition to
the developers’ CVS commit messages; the keyword


@doc

at the end of the message when a change needs to be documented. If there were any
information to add – such as clarification, or a link to some online resource – it
could be added after the @doc tag, but this wasn’t essential.

Mehdi planned to use the tag in SQL logging, and hoped to store any description
given alongside the standard fields (date, login, CVS branch, changed files and
commit message). This would provide the phpdoc team with the means to generate a
dynamic TODO, including searches by PHP version, extension, assignee and keywords.
They could even add a feature that enabled them to email a request for help to the
developer responsible for a particularly obscure change, at the press of a button.
Did anyone have any thoughts on the matter?

Sara responded. She explained that most of the team, herself included, always
keep the NEWS file up to date, and she personally was willing to extend those
notes into documentation where appropriate. However, the NEWS file only
covers features; things like the quirks in Unicode semantics mode in PHP 6 would be
a little more difficult to track down. Sara promised to make an effort to catalogue
and document the areas she’d worked on, as far as that went. The main issue was that
most of the undocumented additions had been made to PHP 6, and it’s still too early
to start advertising features in the manual that – from the perspective of the end
user – simply don’t exist. She asked where the documentation team stood on that, and
whether there had been any decision made regarding the timing of the appearance of
PHP 6 features in the manual?

Sara thought the @doc idea was a good one, and promised to use the
tag in future. The only thing that didn’t sound good about Mehdi’s ideas, to her,
was the one about the automated email asking for help – the personal touch is more
likely to get a response.

Short version: Don’t ever take a year out. Coming back is hell.

CVS: Realpath fixes

Changes in CVS that you should probably be aware of include:

  • Core bug #40191 (use
    of array_unique() with objects triggers segfault) was fixed in
    PHP_5_2 branch and CVS HEAD, bringing a slight benefit for the 4_4 branch, where
    array_unique() will now return an array with any type of argument
    [Tony]
  • Also in the core, bugs #39367 (clearstatcache() doesn’t clear
    realpath cache), #40092 (chroot() doesn’t clear realpath
    cache), #40200 (The
    FastCgi version has different realpath results than thread safe
    version) and #40231
    (file_exists incorrectly reports false) were fixed
    [Dmitry]
  • In ext/pdo_oci, PECL bug #7295 (ORA-01405: fetched column value is
    NULL on LOB fields) was fixed in 5_2 only [Tony]
  • Stream filter bug
    #40189
    (possible endless loop in zlib.inflate stream filter) was
    fixed in 5_2 and HEAD [Tony]
  • The ovrimos extension was moved out of PHP_4_4 branch to PECL
    [Derick]
  • Also in PHP_4_4 branch, the PEAR bundle was upgraded to version 1.5.0 [Greg
    Beaver]

Tony inadvertently incurred the Wrath of Ilia by committing another PDO_OCI fix
just three minutes after PHP 5.2.1 RC4 was tagged. Tony explained that ‘most of
the PDO_OCI tests
‘ had failed on his box, and he was simply fixing the problems;
did Ilia want him to revert his changes? Evidently Ilia didn’t on this occasion, but
he made a point of requesting a complete code freeze in the PHP_5_2 branch shortly
after.

Meanwhile in the foggy fields of PHP 6, Marcus Börger was busy adding some more
new bits and pieces. We gained a new PHPAPI function,
php_info_print_module(), as part of a code cleanup, and (on the
userland side) a new option for picking up Reflection info in the CLI SAPI. Typing,
for example:


php --ri gd

will now display the configuration of – you guessed it – the GD extension.
Hopefully this handy little item will find its way into PHP 5.3 (or earlier?) as
well as PHP 6.0. Marcus went on to add an option offering ftruncate
support for memory streams, which he attempted to backport to the PHP_5_2 branch,
but the Wrath of Ilia was upon him. The option is now scheduled for PHP 5.2.2.

Sara wasn’t idle either. She committed the patch to expand
allow_url_fopen/allow_url_include functionality that was
offered on the list for review last week.

Short version: A quiet week, largely due to the deep freeze in PHP_5_2 branch.

PAT: New stuff in the cupboard

Derick picked up on Gentoo developer Luca Longinotti’s patch to correctly detect
the rounding fuzz in PHP_4_4 branch, and applied it with thanks.

IBM Research developer Andy Wharmby, who had now ‘found some time to look at
the COM defects
‘, started bombarding the list with patches and notifications
about bug report updates. Andi suggested he apply for a CVS account, which Andy duly
did. Pierre backed it, just in case anyone wasn’t sure and needed confirmation.
Nobody responded, and Andy continued to post patches until Andi noticed he was still
without an account at the weekend. Needless to say, Andy’s account is now open and
the bombardment at an end.

Another IBM programmer, Caroline Maynard, posted a patch supplying some missing
EXTERN_C() declarations she’d come across when compiling a C++
extension in a non-threaded environment. Her cvs.php.net karma doesn’t extend to the
Zend module; her patch is in the PAT
directory
waiting to be noticed.

One Daniel Rozsnyo offered a well-tested (on Gentoo) patch providing full
multicast support in ext/sockets, saying he thought others may find it
useful. He asked that the implementation be checked carefully prior to committal,
particularly when it comes to the parameter parsing API and cross-system
compilation. The patch was written against PHP 5.1.4 <sigh /> but is stored in
PAT anyway, just in case anyone had any thoughts about maintaining/extending the
sockets extension at this late stage in its career. Daniel’s background notes are
here.

Scott MacVicar posted a gentle reminder about his mysql_set_charset()
patch, offered back in October and already
a PAT resident. He claimed that
one of his clients had recently suffered
an exploit because of this issue, and reminded the team that not all shared hosts
make ext/mysqli available. Olivier Hill intervened to explain why his own
attempt at circumventing the problem had been turned
down
for the PHP 4 branch back in July. Scott had
missed that episode entirely, and hoped again that Derick would consider this a
security fix and allow it in the PHP_4_4 branch. Failing that, perhaps Ilia might at
least apply the relevant patches in PHP_5_2 and CVS HEAD. The only alternative Scott
could see would be to undertake a scheme of mass education… Ilia responded to this
with an explanation that it would have to wait for PHP 5.2.2 now in any case. Luca
from Gentoo, who evidently also has an interesting collection of patches, wrote that
he’d got around the problem by adding a new PHP_INI_ALL directive to
allow Gentoo/PHP users to define the connection charset for MySQL. He was happy to
share his PHP_4_4 patch (which is unlikely to find favour in the core
because it adds a new INI directive).

And finally, internals developer Michael Wallner – another one with no access to
the Zend Engine source – posted a PHP_5_2 patch to fix
zend_llist_remove_tail(), which he claimed currently fails to reset
zend_llist->head correctly. Again, his post went under the
collective radar; the patch is sitting in PAT.

Short version: The pressure to introduce a charset selector for ext/mysql is mounting.