Zend Weekly Summaries Issue #345

      Comments Off on Zend Weekly Summaries Issue #345

REQ: Last updated
TLK: Logging native function calls
RFC: HTTP 500 on error
TLK: Building on Windows
TLK: apache_child_terminate
TLK: unicode.semantics
BUG: PHP 6 and array indices
BUG: snprintf/var_export
REQ: Full-text search for SQLite
RFC: array_get
NEW: PHP-GTK 2 beta
CVS: Quiet week
PAT: User streams revisited

11th June – 17th June 2007

REQ: Last updated

Andi Gutmans put in a request for a change to the bugs database. He thought
it would be good to have a ‘last updated’ column so that developers can
easily see whether a bug has been lingering on and ignored forever, or is
actively being dealt with. Could this be added?

Jani Taskinen is still working on making the PEAR bug tracker
project-independent so that it can be used for PHP. He wrote that not only is
this on his list, but the PEAR bug search engine already includes an ‘updated
since…’ select box.

Short version: The benefits of adapting the PEAR tracker are yet to
be fully appreciated.

TLK: Logging native function calls

Markus Fischer, stuck with PHP 4 and a legacy system, wrote that he was
seeing serious problems related to MySQL connections but – despite lengthy
auditing and logging sessions – had been unable to track down the precise
cause of them. Thanks to an unduly complicated code base that uses such
niceties as variable variables to store connection details, his team had been
unable even to identify everywhere that mysql_(p)connect() or
mysql_query() is called. Markus was desperate to the point of
injecting logging calls directly into the C source for those functions, but
wanted to know if anyone had any better ideas.

Alexey Zakhlestin offered him Xdebug. Markus
responded that he was aware of Xdebug’s existence, and had even used it in the
past, but didn’t see how it met his criteria in this case. He needed very
specific information about the MySQL operations, including all the connection
and SQL statement metadata and its relationship to the current request. He
also needed to be able to identify the server IP, the remote IP, the HTTP
host (because there were ‘zillions of vhosts‘), the URI and the user
agent string. His main idea had been to have a single point from which to log
everything using the (single) MySQL driver. Was this even possible using
Xdebug… in a production environment?

Stefan Priebsch wrote that it’s possible to configure Xdebug to create a
trace log detailing the parameters passed to the functions called, and
perhaps pick up the environmental information for logging via a custom
function call in the application. In either case, he wouldn’t recommend trace
logging in a production environment, since it would ‘absolutely kill your
performance
‘. That said, if the only option was to deploy a patched
version of PHP in a production environment, then running a slow Xdebug-based
trace on Sunday morning at 4am might be the way to go.

Short version: A coder’s lot is not an ‘appy one.

RFC: HTTP 500 on error

Most PHP production installations use the INI setting
display_error=0. Dmitry Stogov noted that, although this setting
successfully hides error messages from the end user, it can result in a blank
page being served. He offered a patch for review that would send an HTTP 500
response on error.

PHP user Richard Quadling liked the idea, but thought that
set_error_handler() or set_exception_handler()
would already offer this option. Richard Lynch wanted it to be customizable,
and mentioned the words “configuration switch”. Dmitry commented that there
are ‘too many configuration options already‘; he would prefer not to
add more. Stas Malyshev suggested set_error_code(500). Uwe
Schindler, though, noted that the situation where a fatal error occurs and no
output was yet sent doesn’t leave room for a configuration setting. The SAPI
should return FAILURE to the server from the handler function
and set the status code; the server would then generate the default error
page. Nicolas Bérard-Nault wrote similarly that any code other than
HTTP 500 makes no sense, but Andrei Nigmatulin argued that ‘most web
servers use 500 as “Backend unavailable” (connection refused)
‘. Nicolas
suggested that he probably was confusing it with HTTP 503. Andrei thought
otherwise. An impasse was reached while everyone else went to look at section
10 of RFC 2616.

Stas meanwhile re-presented his set_error_code() idea, but
Hannes Magnusson challenged it. He believed set_error_code()
would never be called in the situation addressed by Dmitry’s patch, and even
if it somehow could be, why would anyone want the error code changed, and
into what? Stas argued that it’s possible for a fatal error to occur before
any output is produced but after part of the application code has run.
Personally, he wouldn’t want to change the error code anyway; he had simply
been trying to provide a solution for those that had said they might.

Dmitry later committed his original patch into the PHP_5_2 branch and CVS HEAD.

Short version: Missing the point.

TLK: Building on Windows

Zoe Slattery had been trying to build PHP 6 using Visual C++ Express Edition
V8 (AKA “the free version”). There was a closed bug report that
described the problems she was seeing, and she wanted to know whether this
was something that had been fixed in PHP 5 but not CVS HEAD. Rob Richards
replied that it had worked for him ‘the other day‘, and wondered which
version of the Platform SDK Zoe had there. Zoe confirmed that she had
downloaded the 2003 SP1 SDK at the same time as the compiler; these should
work together, although Rob noted that the default SDK included with VS had
given him problems.

Pierre-Alain Joye recommended Elizabeth Smith’s online
tutorial
, but Zoe’s errors rang bells for Stas. He recommended that she
should set USE_32BIT_TIME_T, and possibly
_CRT_SECURE_NO_DEPRECATE to prevent compiler warnings. Rob wrote
that both should already be set during the build, assuming the compiler had
been detected correctly, and asked Zoe to check the generated Makefile
and ensure that the former was defined.

Zoe finally sent in a joyous note to say that she’d got her build working.
She added that it had been ‘such a pain getting there‘ she wanted to
start over and write up the steps so that others might suffer less.

Short version: More docs for free builds under Windows are on their way.

TLK: apache_child_terminate

Brian Moon wrote to warn followers of the PHP internals list that Gentoo have
finally removed Apache 1.3 from Portage. He added that many other Linux
distros, including Red Hat, haven’t offered Apache 1.3 ‘for years‘,
and the only reason his own team are staying with it is that PHP has no
support for apache_child_terminate() in Apache 2. Brian wanted to
know what the problem was there.

Scott MacVicar explained that apache_child_terminate() isn’t even
supported in the Apache API any more, since in the new MPM structure a child
process can be serving dozens of requests rather than just one. As a result,
it’s no longer practical to mark something to exit at the end of a request.

Short version: Not everything is PHP’s fault.

TLK: unicode.semantics

In a post headed “What’s the use of unicode.semantics in PHP 6?”, Jani
Taskinen wondered what exactly the reasoning had been behind adding the INI
directive in the first place. He and Derick Rethans had both attended the
original PHP Developers’ Meeting in Paris, but were unable to recall why this
had been agreed. It made no sense to Jani to upgrade to PHP 6 for its Unicode
support and then disable it – people might as well stay with PHP 5.

Scott MacVicar agreed, and wrote that he thought the ability to toggle
Unicode support would add ‘more confusion and grief for application
developers
‘, particularly because the setting is
PHP_INI_SYSTEM. He believed the original reason for having the
option was performance-related, but didn’t know whether this was still a
valid concern. Pierre agreed that performance had been one of the issues;
the incompatibilities‘ had come into it, too. However, he’d prefer a
single mode for PHP 6, and that mode should be Unicode. He added that there
are no other new features scheduled for PHP 6 that couldn’t equally well
become part of PHP 5. Derick agreed with him on all counts.

Rasmus Lerdorf argued that the unicode.semantics toggle is
needed because, without it, people are likely to use the need to upgrade
their code as a reason to stay away from PHP 6. Application developer Tomas
Kuliavas pointed out that changes made with unicode.semantics=on
aren’t compatible with PHP 4 and PHP 5; some scripts simply won’t work
alongside it, because it places ‘very strict checks on string
variables
‘. Those needing to maintain compatibility with PHP 6 Unicode
mode will need a separate code branch in order to do so. Tomas particularly
dislikes it that he can’t control the unicode.semantics setting
from within his scripts and the interpreter tries to outsmart him, when it’s
switched on, without knowing anything about his coding environment. He added
that Unicode function and variable names won’t work in an international
coding environment; international developers must use something
understood by all, and that – to him – means ASCII and English function names.

Johannes Schlüter backed Tomas. He wrote that, from an application
development perspective, the unicode.semantics setting is even
worse than magic_quotes, because there’s no way to work around
it. Johannes went on to talk about Unicode support from an internal
development perspective, complaining that the UG(unicode) checks
in the code not only make maintenance much harder but also come with a high
performance cost. Unicode support meant BC breaks, and the INI directive
doesn’t help with that. Johannes concluded that ‘that damn setting
should go.

Rasmus challenged him to name a single BC break or, better still, to file a
bug report if he found one. The whole point was that there shouldn’t be any.
In response to a comment of Pierre’s about the lack of a migration path,
Rasmus pointed out that this is not the same situation as with PHP 5. Nothing
there had been designed to break compatibility with PHP 4. It wasn’t
possible to take this approach with PHP 6 ‘without being completely
inconsistent with how Unicode should work
‘. Pierre agreed in theory, but
pointed out that in practice migration to PHP 5 had been a headache for many.
However, his point had been about the wisdom (or otherwise) of design choices
intended to minimize migration pain. It hadn’t worked well for PHP 5; Pierre
believed that it only worked for PHP 4 because there had been such an obvious
improvement between PHP 3 and 4 that everyone wanted to upgrade. He
suspected that Unicode is ‘not that appealing for almost all users, even
those who actually need it
‘, and pointed out that ext/mbstring
already suffices for many.

Jani picked up on Rasmus’ challenge to find a BC break, and argued that
removing register_globals and magic_quotes could be
seen as just that. With that breakage already behind them, why keep themselves
in ‘the BC pit‘? He also reiterated Johannes’ points about
UG(unicode), which is only needed because of the option for
strings in PHP 6 to be other than Unicode strings. Tomas pointed out again
that it is possible to work around register_globals and
magic_quotes and have portable code; this wasn’t the case with
PHP 6.

Cristian Rodriguez wrote that unicode.semantics will be useless
if it remains a PHP_INI_SYSTEM setting. Most users won’t be able
to turn it on; hosting companies are likely to keep it switched off because
turning it on will break existing applications. Either the setting should be
PHP_INI_PER_DIR, or PHP 6 should always be in Unicode mode.
Rasmus pointed out that those hosting companies will never upgrade to PHP 6
if Unicode is forced on them and existing applications broken; the team would
be forced to maintain PHP 5 ‘forever‘. He personally would rather have
everyone working on the same code base and ‘keep the Unicode vs
non-Unicode battle to a configuration setting
‘. As for
PER_DIR switching, it had been considered in the past; but
allowing a single process to switch between encodings within the same script
would create far more serious headaches for PHP users than the
INI_SYSTEM switch.

Short version: Insurrection in the ranks.

BUG: PHP 6 and array indices

A Robert Lemke wrote to internals@. He is currently working on the next major
version of TYPO3, a PHP-based content management system; his team are writing
for PHP 6 with unicode.semantics on.

Robert had stumbled across an unexpected behaviour in CVS HEAD and wanted to
check whether he’d found a bug:


<?php

preg_match('/(?P<character>w),/',
'a,b,c,d', $matches);
echo (isset(
$matches['character']) ?
'yes ' : 'no ');
//
output: no

?>


The reason for this was that the index character in the returned
array was a binary string, rather than the Unicode string Robert had expected
to see. This seemed inconsistent to him.

Andrei Zmievski agreed that it was indeed inconsistent, and added that he’d
try to find time to fix it over the coming weekend.

Short version: Not such a perfect match.

BUG: snprintf/var_export

Derick wrote to report that, while testing some PHP 5 code under the
de_DE locale, he’d discovered that var_export()
uses a comma to represent the floating point decimal separator. Since this
isn’t valid PHP code, he regarded this as a bug in var_export().
On investigation, he’d found that the underlying function,
php_var_export(), uses the following call:


php_printf("%.*G", (int) EG(precision), Z_DVAL_PP(struc));

It seems the G modifier is locale-aware, and will happily create
5,12” as a “valid PHP float”. This of course does not parse.

Derick had expected the uppercase G modifier to be
locale-insensitive and the lowercase g to be locale-insensitive,
but this wasn’t so; both simply kill trailing zeroes. Changing the
G modifier to be locale insensitive would break
_convert_to_string(); changing the g modifier to be
locale insensitive might not break anything (who knows?). Derick
thought it best to introduce a new modifier that couldn’t possibly impact any
existing code, and suggested H or L for this
purpose. He added, as an aside, that _build_trace_args() in
zend_exceptions.c is also affected by locale sensitivity in the
G modifier.

Short version: Bitten by a German bug.

REQ: Full-text search for SQLite

Completely by chance, I came across fts2, the full text
search module in SQLite 3, and promptly wrote to Ilia Alshanetsky asking
whether there are any plans for including it in the built-in SQLite 3 library
powering PDO_SQLITE.

Ilia evidently likes the look of it too. He replied that he was simply
waiting for fts2 to be better tested before making it available,
and will probably include it in the next library update.

Short version: Killer feature incoming.

RFC: array_get

One Andrew Shearer came up with a long but thorough post
proposing a new function, array_get():


mixed array_get(array $array, mixed $key[, mixed $default = FALSE]);

where the optional third parameter allows a default return value to be
defined.

Andrew was aware of the long-running
ifsetor debate
, and reckoned that the general case for it boiled down to
retrieving values from arrays. This was his solution to the same problem of
an array accessor that doesn’t generate an E_NOTICE when the
target value is missing, with the advantage that it doesn’t require special
language syntax support. He wrote that he had written and benchmarked a patch
introducing the function, but would post it only after responding to feedback.
Finally, Andrew offered a piece of back-compatible userland code replicating
the functionality of the proposed array_get():


if
(!
function_exists('array_get')) {
    function
array_get($arr, $key, $default =
false) {
        if
(
array_key_exists($key, $arr)) {
            return
$arr[$key];
        } else {
            return
$default;
        }
    }
}


before asking for comments. Strangely, there were none.

Short version: Not the fastest way to go about it.

NEW: PHP-GTK 2 beta

Andrei, as project lead, had set the young and thrusting PHP-GTK team a
challenge before heading out to Fiji (lucky man): roll the beta release
before his return.

Unfortunately I was the only other person around with the necessary karma to
put out a release, which would have been better if I’d had enough free time
to do the job. On Andrei’s return I asked him to give Anant Narayanan and
Elizabeth Smith CVS access to the website module, since the pair of them had
been working their bits off to get the project codebase ready for the
release. Andrei promptly did so, and Anant triumphantly announced the beta
the very next day:

Short version: Congratulations, gang!

CVS: Quiet week

Changes in CVS that you should probably be aware of include:

  • In ext/wddx, bug #41527
    (WDDX deserialize numeric string array key) was fixed across all current
    branches of PHP [Ilia]
  • Core bug #41655
    (open_basedir bypass via glob()) was fixed, again
    across all current PHP branches [Ilia]
  • Zend Engine bug #41633 (Crash
    instantiating classes with self-referencing constants) was fixed
    [Dmitry]
  • In ext/json, bug #41673
    (json_encode() breaks large numbers in arrays) was fixed
    [Ilia]
  • In ext/soap, bug #41566
    (SOAP Server not properly generating href attributes) was fixed
    [Dmitry]
  • The bundled PCRE library in CVS HEAD was upgraded to 7.2 RC3, and a
    script added that will automate the upgrade in future [Nuno Lopes]
  • Also in CVS HEAD, core bug
    #41609
    (file_put_contents() is not binary safe when a binary
    string is given) was fixed [Pierre]
  • Core bug #41693
    (scandir() allows empty directory names) was fixed in PHP_5_2
    branch only [Ilia]
  • GD bug #41717
    (imagepolygon does not respect thickness) was fixed
    [Pierre]
  • A memory_limit interruption vulnerability in
    zend_alter_ini_entry() was fixed, again in PHP_5_2 branch only
    [Ilia]

In other CVS news, Stas made moves to disallow the characters (
)@,;:[]?={}&%
from unquoted cookies, in line with the relevant RFC.
One day after he’d committed this addition to the character blacklist across
all three current branches of PHP, Stefan Esser removed it. Unsurprisingly,
Stas asked for an explanation. Stefan pointed out that ‘even Zend
Platform
‘ had been using a colon in session IDs until recently, and there
was no way to know how many others out there may be doing the same.
Particularly when a session ID is generated using base 64 encoding, the final
character is more than likely to be =. This BC break was
completely unnecessary; it was possible to support every character on Stas’
blacklist simply by encoding the ID, and in fact Stefan had just done so in
the source. Finally, the characters aren’t forbidden in the Netscape Cookie 0
format used by PHP in the first place; everything except whitespace and
semicolons is allowed!

Short version: Gently does it.

PAT: User streams revisited

Undaunted, Stas offered up a second patch to restrict user streams from
executing dangerous operations within the include context. The difference
between this and his earlier
attempt
was that the optional argument for
stream_wrapper_register() is now an integer rather than a
Boolean value, in line with a suggestion from François Laupretre.

Short version: A security model for user streams is in PAT awaiting review.