Zend Weekly Summaries Issue #341
TLK: Writing the test suite
TLK: PHP 6 preview
TLK: Assigned bugs
FIX: Bug 41401
RFC: User streams in PHP 6
FIX: Segfaults in PHP_5_2
TLK: Updating to PHP 6
CVS: mysql_set_charset
PAT: Interpolated strings revisited
TLK: Writing the test suite
Zoe Slattery of IBM wrote to alert the PHP core development team that ‘a
few tests for array, string and variable handling functions‘ (read:
loads) had been added over the weekend. Zoe added that the team had found
some difficulty in keeping the test output small enough to be easily read.
They’d invented a kind of code for file naming:
test_name_b.phpt b=basic function test_name_v.phpt v=variation test_name_e.phpt e=error cases test_name_o.phpt o=object specific
but were happy to change their scheme if anyone had a better suggestion.
Tony Dovgal reckoned that the real test name is in the
--TEST-- section of the .phpt file; the name of the file
is pretty much irrelevant and should be kept short. That said, he confessed
to having recently proposed a distinction in the filename when it comes to
64-bit tests because it would make 32-bit test exclusion easier. Jani
Taskinen had another proposal, but Tony didn’t recall the details…
Marcus Börger wanted to see meaningful filenames like
test_name_(basic|variation|error|object)[0-9]{3}.phpt, which would
help find the appropriate test files during development. He wrote, though,
that he hadn’t bothered to do this when generating test files because the
name in the --TEST-- section is more important.
Zoe went with the ‘meaningful’ version, but took the point about the
importance of the --TEST-- section name. She wondered whether
there is any mapping currently between the functions listed in the reference
manual and the test cases that relate to them? It would be nice, she wrote,
to be able to click on a function name and go straight to a list of existing
test cases. So nice, in fact, she might write one.
Raghubansh Kumar, another member of the test-writing team, preferred
meaningful filenames too. He thought it’d be good to have the name of the
function as part of the filename, e.g. a script testing the basic
functionality of current() and next() might be
named current_next_basic.phpt. Tony pointed out that you’d still need
to read the --TEST-- section to find out what precisely was
being tested, but he really didn’t care what colour the wheel was – he just
wanted more tests. Following a further query from Zoe, Tony went on to
explain that the U prefix on the names of the output sections
reflects the expected outcome in PHP 6 Unicode mode. Zoe double-checked that
it’s okay to have EXPECT(F) and UEXPECT(F) in the
same file, and that PHP 6 with unicode.semantics off should
respond in the same way as PHP 5, before discovering that she couldn’t use a
php.ini file to make that difference. Marcus Börger, explaining
that the test suite pretty much ignores php.ini, introduced her to the
test switches -u, -U, -N and, more
helpfully, -h.
Stas Malyshev intervened to say that he thought a test name like
func_01234.phpt would only really be useful to him if he were ‘a
being with unlimited lifespan, infallible memory and never felt bored‘.
Tony suggested that ‘most humans‘ should know to run:
|
- an interesting concept of humanity. Zoe didn’t mind
renaming files as she went, and suggested making the test directory structure
reflect the function reference when it comes to core functions too. Raghu
preferred the structure to reflect the source code rather than the manual,
pointing out that this is in fact what it does currently. That said, he had
some ideas for improvement; he’d like to have sub-directories in
ext/standard/tests/, and he’d also like to see the test suite
structure explained on qa.php.net.
Marcus, going back to Zoe’s initial comments about mapping tests to their
function names, suggested that a simple XML file stored in either the php-src
or phpdoc module would be enough. That way, old files won’t need renaming and
their history can be retained.
Zoe later found that her updates to range.phpt failed when
unicode.semantics=on. Was it OK to add a test she knew would
fail if it happened to be testing an area under development? Derick Rethans
suggested that she should mail internals@ if she was at all uncertain about
Unicode mode behaviour, but commit otherwise. Marcus confessed to having
deliberately committed failing tests in the PHP_5_2 branch to remind himself
to merge changes from CVS HEAD, but added that Zoe should stay in touch with
the relevant developer in such a case. Zoe realized, on reflection, that
she’d no idea whether the behaviour she was seeing was expected or not. The
lines:
threw a single E_NOTICE under PHP 6 native, but two in Unicode
mode. Tony promptly fixed the buglet.
Short version: Tony needs to get out more.
TLK: PHP 6 preview
Marcus unleashed a storm when he posted to internals@ about the lack of
progress in PHP 6. He’d noticed that a few of the core team had taken to
developing in the PHP_5_2 branch and neglecting CVS HEAD altogether, and this
concerned him. As far as Marcus knew, only the filter extension was holding up
the process; since Pierre-Alain Joye hadn’t done the work, it seemed to him
that someone else should step up. Would anyone having in-depth knowledge of
that area care to jump in? Besides that, was anything else still to be done
before PHP 6 could have a preview release?
Tony believed there was still some work to be done on streams, and Stas
reported that there is work in progress to support more of the ICU
functionality, prompting Mike Wallner to ask rather bitterly where
this work is in progress? He’d posted to the php-i18n list about the
ResourceBundle API, but had no response beyond a single message
from Andrei Zmievski. The Clayton formerly known as |0t3k stepped up to offer
help with specific ICU tasks, and added that ICU ResourceBundles
would cause issues in a mass hosting scenario because they’re cached
per-process by locale identifier. Andrei wrote comfortingly that the work
under way only consists of API design and prototyping at present anyway;
information will be posted on php-i18n@ as soon as there’s something to
report.
Johannes Schlüter suggested that, if CVS HEAD is to be taken seriously
as the main development branch, the number of tests failing there should be
reduced. He was seeing 42 failures at this point on his own setup, and this
made it difficult to know whether patches were breaking existing behaviour.
Pierre responded to Marcus point by point. People must MFB (merge from
branch), and if they lacked the time, knowledge or motivation to do so they
should ask for help. If Marcus believed there was work outstanding on
ext/filter, he should simply have asked Pierre – or any other
maintainer – for a status report; but the extension isn’t responsible for
holding up PHP 6 development anyway. Perhaps Marcus had been referring to the
input encoding system, which neither Dmitry Stogov nor himself had had time to
finish? If so, the current status is that the runtime JIT work is complete but
the input decoding work is not.
Marcus thanked Pierre for clarifying the situation.
Andrei grumbled that Pierre hadn’t replied to any of his emails or IRC chat
requests, with the result that Dmitry, Sara Golemon or himself was going to
have to complete the input decoding feature. He would like to ask Pierre to
leave it alone at this stage, ‘since you proved very unreliable in the
past‘. Pierre protested that he’d only offered to lend a hand, not to be
wholly responsible for the project. He’d been active elsewhere because
security or critical bug fixes in the current PHP branch take precedence over
development work in CVS HEAD. This heated exchange between Pierre and Andrei
rapidly escalated, and I intervened to ask both of them to take the
recriminations off-list and get back to the main points raised. There needs
to be a real commitment to making input decoding work and to fixing the
issues in streams; and there are genuine problems in the growing tendency to
view PHP_5_2 as the current development branch. Too many PHP_5_2 patches are
never merged to HEAD. I’m not seeing a groundswell of commitment to PHP 6,
and it’s possible that pushing individual developers too hard isn’t the best
way to beat Perl 6 to release.
Tony backed me, and expressed the hope that I was keeping track of the
missing merges. Cristian Rodriguez saw ‘zend.com’ in my email address, made a
few assumptions, and leapt to the defense of the entire PHP development team.
Heh. Wez Furlong wrote to Pierre; all Andrei had meant was he should just say
if he didn’t have the time to fulfill his obligations. Pierre once more denied
ever having signed up for this particular task in the first place.
Short version: Task seeks owner.
TLK: Assigned bugs
Tony seized the moment to ask Wez about the many COM, PDO and streams related
bug reports assigned to him; should the team be looking for new maintainers in
these areas? Wez wrote testily that he hadn’t assigned them to himself (i.e.
agreed to take them on) in the first place, and there are other people
perfectly capable of doing the work. He thought it obvious that he’d been too
busy to do much himself lately.
Tony, who hadn’t intended his post as any kind of attack, explained more
gently that when monitoring bug reports, if there’s something he can’t fix he
needs a name to assign to that job. He understood that Wez was snowed under,
but that being the case it would be helpful to have another name. Wez thought
it unreasonable to assign bugs to busy people without asking them and then
complain when those bugs aren’t fixed. After all, it wasn’t as if he’d
promised Tony to work on something and then not done it (cue angry response
from Pierre). For the time being, Tony should assign bugs in Wez’ areas of
interest to the other listed maintainers. Tony replied that if Wez (or anyone
else) isn’t available to act as a maintainer it would be more practical to
simply remove his name from the maintainer list.
Sebastian Bergmann stepped up with a suggestion; why not assign bugs to a
group, e.g. extension-name@bugs.php.net?
Stut thought it would be better to leave bugs unassigned than to assign them
to a group and lose accountability and incentive. It made more sense to him
to have maintainers keep an eye on incoming bugs, and the bug database
monitor simply ensure that the reports are correct and in the right area,
without assigning anything to anybody. One Jeffery Fernandez suggested a
status for each maintainer, so that monitors knew which maintainers they
could or could not assign bugs to. And Lester Caine wrote that the only
person assigning anything should be the person actually working on it.
Short version: Another part of the process that needs to be written
in stone.
FIX: Bug 41401
Stas tracked down the problem of unexpected results when dividing by a
negative to a precedence bug in the parser rules. He could see how to fix it
easily enough, but wanted to know if anyone could think of a good reason to
leave things as they were.
Rasmus Lerdorf thought the fix should go in, since the bug was ‘an obvious
mistake in the parser‘. Derick, however, was concerned that changing it
might break existing code, and asked Stas not to apply the change to the
PHP_4_4 branch.
Stas subsequently applied his fix in CVS HEAD and the PHP_5_2 branch only.
Short version: The balancing act strikes again.
RFC: User streams in PHP 6
Greg Beaver had been mulling over ways to allow user streams in PHP 6 without
infringing on the security measures of allow_url_(fopen|include).
Going on the assumption that the allow_url_* directives were
intended to prevent accidental remote execution vulnerabilities such as
|
rather than to prevent users from accessing remote sites altogether, Greg saw
the problem as the marking of all userspace streams as ‘remote’. His solution
was to add a new function, stream_wrapper_set_local(), taking a
registered user stream wrapper as its argument. This would retain the
objective of the allow_url_* options, in that it would still be
more difficult to write insecure code than currently, and truly paranoid
systems administrators could simply add
stream_wrapper_set_local() to their disabled functions list.
Ilia Alshanetsky didn’t like the idea. There’s always the option to use
fsockopen() or cURL if you need to access external data, so why
add a bypass for this? François Laupretre, though, thought it a very
good solution, and explained to Ilia that it wasn’t a question of bypassing
filters; the default behaviour would remain as it stands. The chief point was
that it didn’t allow an internal remote stream to be marked ‘local’.
Stas firmly believed it was a mistake to mark all user streams ‘remote’ in
the first place, and added that it still doesn’t catch UNC remote access
under Windows (as in 1.2.3.4mysharemyfile.php)
anyway. He didn’t like the idea of disabling allow_url_fopen by
default either; it didn’t make sense to him to disallow that when
curl_open(), which presents the exact same security risks, is
allowed. Coming back to Greg’s proposal, Stas wrote that having
stream_wrapper_set_local() would effectively be the same as
reverting the decision to mark all user streams ‘remote’; although it would
resolve the problem at hand, he saw it as a ‘very perverted way of doing
things‘, since users would need to mark their streams ‘local’ every time
they wanted to do anything useful with them.
With that off his chest, Stas offered up a five-point plan of his own:
- Rename
allow_url_includeandallow_url_fopen
toallow_remote_includeandallow_remote_fopen - By default,
allow_remote_include=0,
allow_remote_fopen=1 - Streams can be any of three types – remote, local and user/local
- User streams can be declared when registered as either ‘remote’ or
‘user/local’, with ‘remote’ being the default - When an operation is run on ‘user/local’ stream,
allow_remote_fopenis disabled if
allow_remote_includewas disabled
This way, a user stream by default would be marked ‘remote’ and behave just
like http://. A user stream marked ‘local’ could be used for
includes when allow_remote_include=0, but would not be able to
do things like:
to make the local include work with remote files. Implementing the Stas
solution would require a new optional parameter in the stream registration
functions and some tweaking in the user streams module to toggle
allow_remote_fopen for user/local streams. Finally, there should
be a userland function, stream_is_remote(), to check the stream
flag, rather than forcing users to rely on fopen() erroring out.
Greg liked the idea for its cleanness and ease of implementation, and gave it
his instant support. Cristian Rodriguez, on the other hand, thought remote
includes should be removed completely, and asked if anyone knew of any valid
use cases for them? Unlike fsockopen() and friends,
include and require are security nightmares because
they interpret PHP code directly. Users fall into error with
include rather more easily than with eval().
Stas pointed out that removing the possibility of remote includes would mean
writing a completely separate file access layer for include and
require. Remote access had come about because stream
functionality was introduced to the filesystem layer; since
include uses that layer, it gained remote access along with
everything else. Cristian wanted to attack the root of the problem in PHP 6
rather than apply a band-aid to fix it. Stas wrote that he was open to
suggestions.
Rasmus Lerdorf wanted to know since when allow_url_fopen has
been disabled by default? It certainly wasn’t disabled in his current
checkout, and he didn’t believe it ever should be.
Meanwhile Stefan Esser had picked up on the conversation and advised Cristian
not to bother advising anyone on the PHP development team when it comes to
security, sparking a long and needlessly acrimonious exchange with Stas. The
only thing to come out of it was a short list of the MOPB bugs that are not
yet fixed.
Edin Kadribasic had been thinking of valid use cases for remote includes for
Cristian, and offered him this:
|
Oliver Block was somehow under the misapprehension that Greg had assumed
remote code is rarely malicious. Greg quickly put him right; the whole idea
was to prevent unintentional holes created by inexperienced users,
which was why he liked Stas’ idea of auto-disabling
allow_url_fopen (renamed or not) in a user stream when
allow_url_include=1. It would also be nice if a userspace stream
could register itself as always remote, from the perspective of application
deployment.
Short version: Worse than a truth table.
FIX: Segfaults in PHP_5_2
Sebastian Nohn reported segfaults in PHP 5.2.3-dev when running the Zend
Framework 0.8.0 unit tests. He’d traced the last working copy of PHP, but
hadn’t found the guilty patch yet. Sebastian helpfully included
href="http://cruise.nohn.org/buildresults/PHP_5_2?log=log20070518220114">a
link to the list he’d put together of commits made to the branch during
the relevant timeframe, and asked if anyone else could confirm the crash.
Tony asked for – and promptly received – a GDB backtrace that enabled him to
track down the problem. Sebastian was equally quick to confirm that the patch
Tony created fixed the bug, and Tony subsequently committed it into CVS.
Short version: This is how it’s supposed to work.
TLK: Updating to PHP 6
Tomas Kuliavas wanted to know if it would be possible to make
unicode.semantics configurable directly from a script, i.e.
PHP_INI_ALL. Tony replied that this wasn’t possible, and went on
to explain that all string functions should work with both Unicode and binary
strings. That said, he’d like to know why Tomas felt there was a problem.
Tomas explained that he was trying to work out what would be needed to
upgrade SquirrelMail to PHP 6. SquirrelMail scripts are designed to work with
binary strings, but deal with emails written in a cacophony of different
charsets. If the email body or message parts were converted by PHP, the
strings would no longer match the information in the mail headers. When
unicode.semantics is turned on, current PHP 6 breaks one-time
pad creation and mt_rand(). Furthermore, crc32(),
base64_encode() and fputs() all throw warnings or
notices because they expect binary strings and are passed Unicode strings.
Tomas couldn’t provide sample code just yet, but wasn’t too worried at this
point; he could fix the issues in userspace. His major concern was that
similar checks might be added to str_replace(), the array
functions and ext/pcre, thereby breaking charset conversion functions
‘and any other code that operates with 8-bit strings‘. All these
issues could be avoided if there were a way to disable
unicode.semantics from a PHP script.
Tony offered him the binary cast, but Tomas pointed out that it
throws a parse error in both PHP 4.1.2 and PHP 5.2.0. Because it’s a parse
error, there’s no option to detect PHP 6 before reaching the relevant part of
the script, and SquirrelMail’s minimum requirement is PHP 4.1.x. A better fix
might be something like:
|
but that would mean writing wrappers for every affected function. Also, Tomas
had found that
whereas he’d have expected a single character for UTF-8.
Tony thought the casting hack shouldn’t be required because streams should be
used to return binary data in any case; and he didn’t understand Tomas’
comment about strlen() behaviour. Tomas replied that
0xC4 and 0x85 are the hex codes for a small letter
a with ogonek (ą) in UTF-8. He would expect
|
to return bool(true) in a UTF-8 script, but when
unicode.semantics =on, the output was actually
bool(false). Internal SquirrelMail single byte charset decoding
relies on mapping tables written in hex or octal values; in some cases only
the byte value is evaluated, rather than the entire symbol. In the following
script:
Tomas expected to see ą, but was actually seeing
ą.
Stefan Walk explained that xC4x85 is a Unicode string
containing two codepoints, whereas ą is a Unicode string
containing one codepoint – the equivalent to 0x0105. The
comparison should return FALSE when Unicode support is
enabled. Casting the strings to binary would give Tomas the results he
expected, in all the examples he’d given to date.
Short version: Much confusion.
CVS: mysql_set_charset
Changes in CVS that you should probably be aware of include:
Xmlwriter::WriteElement[ns]can be used to write empty
tags following a non-BC-breaking implementation of href="http://bugs.php.net/41326">feature request #41326 [Pierre]- In ext/dom, bug #41374
(wholetextconcats values of wrong nodes) was fixed [Rob
Richards) - Across all current branches of PHP, a cookie is no longer sent when the
session is passed in the URL [Stas] - Zend Engine bug #41421
(Uncaught exception from a stream wrapper segfaults) was fixed [Tony] - In ext/json, bug #41403
(json_decodecannot decode floats if localeconv
decimal_pointis not ‘.’) was fixed [Tony] - In ext/dbase, bug #41394
(dbase_createcreates file with corrupted header) was fixed
[Tony] - The bundled libsqlite3 was upgraded to SQLite 3.3 in the PHP_5_2 branch
[Ilia] - In ext/curl, bug #41358
(configure cannot determine libcurl’s SSL lib as of v7.16.2) was fixed
[Mike] - GD bug #86 (Possible infinite
loop in libgd/gd_png.c with truncated input) was fixed across all
current branches of PHP [Pierre] - In ext/pdo_mysql, feature
request #41416 (getColumnMeta()should also return table
name) was implemented [Tony] - Core bugs #41430 (Fatal error
with negative values ofmaxlenparameter of
file_get_contents()) and href="http://bugs.php.net/41432">#41432 (extract()does not
accept empty prefix) were fixed [Tony] - In ext/openssl, bug
#41423 (PHP assumes wrongly that certain ciphers are enabled in OpenSSL)
was fixed [Pierre]
In other CVS news, Scott MacVicar sparked an exchange of views when he added
the long-requested mysql_set_charset() to the PHP_5_2 branch and
CVS HEAD. Tony, who didn’t recall there being any discussion about this, asked
Scott to revert the change. He wrote out that the function is simply an alias
for SET NAMES and is already implemented in ext/mysqli.
Stefan Walk defended Ilia’s decision to allow it in ext/mysql, seeing
it as more of a security fix than a new feature. Scott explained further; the
difference between mysql_set_charset() and SET NAMES
is that the SQL statement can’t update the internal character encoding on the
MySQL client. If mysql_real_escape_string() uses a multibyte
charset on, say, latin1 encoded data, the function can fail – which in turn
allows SQL injections. Scott also pointed out that many (most?) hosts don’t
know the difference between ext/mysql and ext/mysqli and only
enable the former; if the PHP team wanted that to change, they should
deprecate ext/mysql. Ilia made much the same point about Web hosting
companies. Tony argued that using mysql_set_charset() means
upgrading to PHP 5.2.3. If you were going to do that with a legacy
application you might as well require ext/mysqli. Ilia mentioned the
rather large gap between what the development team would like PHP users to
use, and what they actually do use. The mysql extension works;
people are happy with it; it’s not going to be dropped. Tony grumbled that if
this argument were followed to its natural conclusion, PHP 3 should still be
supported. The function stayed.
In the meantime, Ilia – who has come under a lot of pressure from the team to
start merging his changes to CVS HEAD again – did just that for the
PDO::FETCH_KEY_PAIR mode he’d introduced into PHP_5_2 branch.
Tony rather incautiously wrote that PDO is so far out of sync in HEAD there’s
no point in merging any changes to it or its extensions. Ilia subsequently
‘borrowed’ some code from ext/hash to optimize md5() and
sha1() digest generation in the core hashing functions,
reporting a 20-50% speed increase for short strings. In PHP_5_2 only.
And Rasmus – bored on the long flight to China, according to his commit
message – backported the system call optimizations already in CVS HEAD to the
5_2 branch.
Short version: If the only way to China was by slow boat…
PAT: Interpolated strings revisited
The bug report #41378 (fastcgi
protocol lacks support for Reason-Phrase in Status:
header) came with a patch attached. Although the code needed a rewrite, Dmitry
later credited the bug reporter, anight at eyelinkmedia dot com, for his help
in identifying the problem area.
Matt Wilmas
href="http://realplain.com/php/scanner_optimizations_5_2.diff">posted an
update for his ‘major optimization for heredocs/interpolated strings’
patch following an off-list discussion with Dmitry, who hoped to commit it at
the end of the week. Stas noted that the patch makes
|
generate a number for the opcode rather than a string, but wasn’t sure if
this was important; he also requested benchmarks for the parser. Dmitry
explained that using a number internally is the whole reason for the speed
boost, since it eliminates runtime conversion. Also, the point was to improve
generated code rather than the parser, building a ‘near optimal
ZEND_ADD_STRING, ZEND_ADD_VAR sequence‘ for quoted strings and heredoc.
Stas noted that ‘a major overhaul of the parser‘ should be
justifiable, and asked again for a benchmark test or tests. He wasn’t seeing
any improvement beyond the margin of error in Zend/bench.php, but
recognized that there isn’t much testing of interpolated strings in the
script. Dmitry conceded the point and
href="http://news.php.net/php.internals/29509">posted a benchmark script,
claiming at least a 2% speed increase with Matt’s patch. Stas was unimpressed,
but wanted to know whether the increase would be greater where complex strings
are used.
Matt explained that optimizing the ZEND_ADD_* opcodes at source
relieves optimizers of the task. Using ApacheBench with no accelerator on his
own real-world applications and scripts, he’d seen a typical speedup of
10-15%. Whether the difference is major or not depends on the number of
opcodes and single-character chunks used (because ADD_CHAR is
now used for the latter, rather than ADD_STRING).
Dmitry subsequently committed Matt’s patch into PHP_5_2 and CVS HEAD.
Short version: A major speedup for some uses of heredoc and
interpolated strings in PHP 5.2.3.


One comment to “Zend Weekly Summaries Issue #341”
June 20th, 2007 at 11:59 am
test comment