Zend Weekly Summaries Issue #320
TLK: Focus on *printf
TLK: New Windows build
TLK: Run-time taint support [continued]
NEW: CVS access for Zoe
BUG: assert_options
REQ: Dropping support for Windows 98/ME
CVS: Caffeinated sessions
PAT: Christopher Jones is on the case
TLK: Focus on *printf
Andrei Zmievski, having taken last
week's comments on board, re-presented his and Tony Dovgal's efforts to bring
Unicode support to the printf() family of functions. He planned to
commit it shortly, assuming there were no further objections.
Matt Wilmas noticed a couple of picky little issues with the F and
f specifiers - like, a certain amount of confusion between them. Andrei
fixed those and Matt followed up with a small cleanup patch, which Andrei promptly
committed. Matt hadn't finished yet, though, and posted more patches to fix a bug that had the
%e format specifier giving one decimal place less than advertised. Ilia
Alshanetsky applied the changes, but Hannes Magnusson was unconvinced; he wrote that
the behaviour has always been the same, and the change actually broke a
four-year-old test. Presumably there was some discussion off-list at this point; the
Unicode version was later fixed (by Tony) to work in the way Matt, and the bug
report, expected it to.
Later in the week, Matt posted some questions about locale awareness. The
%f, %g and %G specifiers had it; the
%e and %E specifiers did not, and Matt wanted to know
which was the desired behaviour. Andrei pointed out that POSIX locales are
deprecated in Unicode mode; he didn't think printf() should use
locale-aware formatting by default, because of this. Matt also wanted to know if
there's some reason why printf('%.6g', 1234567890) returns
1234570000, whereas the same figure, following
ini_set('precision', 6);, is reported as 1.23457e+9 (when
cast to a double). He believed the fix was very simple, but again was uncertain
about the expected behaviour in PHP.
Short version: Sometimes it's harder to determine correct behaviour than it is to create it.
TLK: New Windows build
Edin Kadribasic notified the list that he intends switching to 'the new Microsoft compiler' for building the official PHP Windows binary distributions. His test runs had highlighted a few issues, and there had been some adjustments in the build mechanism to accommodate changes in the way distros are packaged. However, Edin now had a usable build, and asked for testers to provide feedback.
Nuno Lopes posted the only response; he'd found the new binary was 'much slower' than the current snapshot built with MSVC++ 6.0. The distributed benchmark script, Zend/bench.php, took five seconds longer with the new build on a Pentium 4 2.0Ghz box. Edin went away and ran some tests of his own, and found the direct opposite - the new compiler produced 2% faster code for him. He subsequently enabled the full build for newer Microsoft compilers in CVS.
Short version: Official Windows binaries are now built using Visual Studio Pro 2005.
TLK: Run-time taint support [continued]
There were suddenly a lot of users wibbling about 'bloat'. Zeev Suraski ignored
all of them. He pointed out that, in Web applications, security bugs are 'pretty
much a plague' regardless of the language used; whatever a language can provide
in terms of helping developers make their applications more secure, is useful. Zeev
shared Ilia's fear that taint mode has the potential to become a replay of
safe_mode, but saw it as (effectively) a marketing issue:
If we pitch tainting as a development-time only tool that points out a certain class of security mistakes, and is by no means an invisible magnetic shield that actually protects you from them - then I think it can be quite useful.
One way to dissuade users from enabling taint mode in production would be to make it slow. In fact, Zeev considered making it a compile-time option, with significant overhead and a big notice saying DO NOT ENABLE IN PRODUCTION. Maybe there should even be a new name for it, to kill preconceptions...
PHP user Robert Cummings didn't see the need for an artificially big overhead. He
thought something like E_TAINT would be acceptable in a production
environment 'in the same sense that E_NOTICE is'. He playfully
suggested 'blighting' as a replacement term for 'tainting':
A blight is upon you in
/path/to/source/foo.php on line 1
|
Thankfully - oh I don't know so much - everyone ignored that aspect of Robert's comments.
Jochem Maas felt that the only simple way of ensuring that taint mode doesn't
become 'another magic suicide bullet' would be to prevent it having any
effect at runtime; make it fill error logs rather than break applications. He also
mentioned E_TAINT as a possibility... and definitively not as
part of E_ALL.
Kevin Waterson wondered whether it couldn't be made part of
--enable-debug, but Robert pointed out that this had nothing to do with
debugging PHP internals. Kevin also suggested 'fainting' as an alternative to
'tainting', but this - quite rightly - had no response whatsoever.
Wietse Venema, the originator of this thread, reckoned there might be 'at least' three modes of operation:
- Disabled - The default setting
- Audit mode - Report perceived problems to a logfile
- Enforcement mode - Don't allow execution past a perceived problem
His personal preference would be for taint checks to be usable in a production
environment. Wietse also thought there was no problem with the name 'taint'; he
wrote that he'd never seen it implicated as a cause for security vulnerabilities in
other languages. (Ilia later used Google to dispute this assertion.) To Wietse, the
issue was one of user education. He agreed, though, that false positives would need
to be reduced to a pragmatic level. Wietse dismissed Ilia's (and others') concerns
over enforcing a given coding practice onto users, given that the proposed
untaint would mark data as good for multiple contexts.
Ilia felt that taint mode doesn't actually fulfill any of its promises; 'there are plenty of Perl applications, supposedly safe from XSS due to tainting, that in reality are trivially exploitable via XSS because the validation regex that does the un-tainting of data is sub-par'. He disagreed, too, with Wietse's ideas about the positioning of 'taint'; he was prepared to guarantee that people would assume taint-checked data is actually safe. Finally, 'different contexts have different validation criteria'; it would be pointless making data safe to print on screen if the plan was to feed that data into a database. Alain Williams argued that PHP users aren't stupid; they could all be taught that 'untainting is only as good as the check that is used'. The tool is useful; it should be part of PHP even if not 100% perfect. Ilia pointed out again that 'good' is a context-sensitive evaluation, and added that it would be unrealistic to offer a partially working tool promising data security and expect users not to rely on it. Alain contended that taint checking is more about raising awareness than anything, and - even partially working - would help in the majority of cases. He wrote:
I wear a seat belt when I drive my car because it will help me in many small accidents. I do not leave it off because it will be useless if a large truck decides to run me down.
Ilia saw a danger there - 'most users will start driving like
maniacs, violating every traffic law thinking that the seat belt makes them
invincible'. Matthew Kavanagh asserted that most users drive like maniacs
anyway... Lukas Smith came into the fray and, dumping that whole whimsical metaphor
business, wrote simply that Ilia was right; a taint mechanism without context
awareness would fail in the majority of use cases. Alain argued that the point of
taint checking was simply to remind the programmer to check that all input fields
pass tests appropriate to their expected data, e.g. $_GET['age']
shouldn't ever be accepted if it contains 'slkfjslfkj'. Lukas agreed on
that point, but reiterated that contextual awareness is paramount. That said, it
would be expensive in terms of development, maintenance and performance. He also
noted that Ruby's taint model is 'simply insufficient' from that perspective,
and asked whether any language actually offers context sensitive tainting at
present?
Pierre-Alain Joye, although personally against the idea of taint checking
altogether, felt it would be wrong to reject something there is clearly a need for.
However, he saw a big difference between a tool for developers or audit teams and an
enforcement mode that would be enabled by many ISPs. He was strongly opposed to
Wietse's third mode; it really would be safe_mode all over again. Robert
Cummings reiterated a point he'd made in a much earlier post. To avoid another
safe_mode scenario, it must be possible to disable taint mode at
runtime, from within the script.
Jeff Moore wrote a very long letter in favour of taint mode, saying basically that the real question should be 'is it better than what we have now?' He saw false positives as a potential irritant, but felt nobody could know how much of a problem they are likely to be at this stage. Interestingly, he also thought there should be no way to bypass taint checking open to the programmer, and the script should die when using data that didn't pass the tests. This was not a muddled analysis; Jeff concluded that taint mode should be offered as a PECL module rather than as part of the PHP core. He was far from alone in suggesting this.
Stas Malyshev bounced back into the discussion to back Jeff's theory that most bugs are there, not because the programmer chose the wrong context when filtering input, but because the programmer forgot to filter the input at all. Pierre pointed out that the filter extension is already available for that precise purpose, and asked if there were anything taint mode might do that ext/filter doesn't. Did Stas really want 'this horrible mode 3'? Stas insisted that he did, and Pierre explained again exactly why many developers would not like to see it. He'd rather see the superglobals dropped and users forced to use the filter functions. Wietse outlined his own plans to Pierre at this point, explaining that 'audit mode' would be the main body of work, as far as the implementation goes, and 'enforcement mode' a kind of tag-on extra. He wasn't too worried that 'enforcement mode' might be turned on indiscriminately, since it wouldn't be particularly useful to most developers to do so. He'd like to provide a way to make it possible to deploy taint checking one file at a time, to ease the burden on the developer. He would also like to keep the options open for future expansion to multiple contexts, although his own investment in the project will be limited to a black-or-white implementation.
Pierre wondered how Wietse expected to solve the problem of ISPs that will enable the mode without first analyzing it, 'just like so many did with safe_mode'. That apart, the strategy Wietse had just outlined was similar to that used in designing ext/filter. In fact, it sounded to Pierre very much like Wietse hadn't actually looked at the extension, and he gave him a couple of links to check out. After all, if there is to be a taint mode in PHP, it will need to work in tandem with filtering, no? Pierre also asked Wietse for some practical specifications and examples rather than abstract ideas; he was having a hard time visualizing the impact of taint, beyond the nightmarish 'mode 3'. Andi Gutmans thought anything more than a black or white taint model was likely to be problematic, whereas the simple approach requires developer know-how to be effective. That said, he agreed with Pierre that the team would be better able to evaluate Wietse's proposal when the proof-of-concept patch is ready.
Stas was firmly on the side of the nightmare; he wrote that, if he had to secure some legacy application, he'd prefer to 'break it hard and then assemble the pieces in the correct way, rather than play find-the-next-hole'. He didn't see the point of a context sensitive taint either, given that (for example) what is 'safe' for one database is not always 'safe' for another. Stas didn't think it should be up to taint to make that kind of decision. Ilia quickly retorted that it would be pointless if it didn't - you could make a scan and check whether raw user data is being passed, without modifying the Zend Engine. In fact you probably wouldn't even need C to do it - you could write that kind of scanner using ext/tokenizer. Ilia had visions of thousands of PHP developers writing an untaint wrapper around all incoming input simply to avoid error messages:
foreach ($_GET
as $k
=> $v) {
|
Andi replied that static analysis isn't effective, thanks to the dynamic nature
of the language. He added that anyone knowingly writing that kind of code to avoid
error messages rather than untainting that data would be doing themselves an even
bigger disservice than those unknowingly failing to filter their input. Ilia argued
that static analysis can achieve most of what taint mode might, without ever
touching the Zend Engine. As to the second issue, not many applications can
truthfully say they are E_NOTICE free. Most developers either switch
off E_NOTICE altogether or use the error-blocking operator. Why should
this be any different?
Rasmus Lerdorf explained to Stas that his 'find-the-next-hole' scenario is precisely the kind of thing ext/filter was designed to avoid; he saw the difference between taint and filter as being simply in the approach. Rasmus went on to explain at some length exactly how Yahoo! use ext/filter, concluding that taint should be geared to helping developers write code that will be more secure on unfiltered PHP installations. He saw taint essentially as a development tool and filter as a security precaution in deployment, but backed Pierre's conviction that the two would need to know about each other.
Lukas, weighing up the pros and cons, concluded that taint really couldn't improve security significantly without context, which of course set Stas off again. He wrote that blaming taint for users' bad decisions would be like blaming an operating system for failing to prevent somebody stealing a laptop that happens to have it installed. Lukas pointed out that the lack of context awareness would mean data needing to be 'massaged' twice over in many cases, which is just as likely to cause problems as forgetting to do anything about the data at all. He saw it as trading one class of error for another, and introducing additional complexity in the process...
Zeev intervened to clarify matters for Wietse. He pointed out that the entire discussion over the previous few days had focused on issues of perception. The problem with mode 3 was that it implies 'magical' security, regardless of what it actually offers, and there is no way to fulfill such expectations. To be acceptable, taint would need to be pitched as a development tool that helps you find issues, rather than as something that increases application security in itself. Zeev was happy, though, to hear that the implementation of modes 2 and 3 are 'pretty much identical'; it allowed the team to keep all the options open until the proof-of-concept implementation is ready.
Short version: Wietse wrote 'at least three modes...' but will probably be talked down to two.
NEW: CVS access for Zoe
Zoe Slattery, who has been producing .phpt test scripts for the PHP test suite for a while now, finally got around to discussing her situation with Andi and Zeev. They recommended that she ask for CVS access to the PHP core and documentation modules, since there's no simple way to fine-grain it. Rasmus subsequently gave her access to both.
Short version: The aim is to make php.net female-only by 2010 ;-)
BUG: assert_options
IBM's Andy Wharmby wrote to say that a colleague writing new tests for PHP's
assert functionality had found a problem when querying the current setting of
ASSERT_CALLBACK using assert_options(). Andy produced a
test case:
<?php
|
On investigation, it seemed the code responsible for processing
assert_options() unconditionally returns TRUE, although
every other form of access to the options works as documented in the PHP manual.
Andy later found that the function had been broken during the change to accept the
array(&$obj, "methodname") syntax in July 2001. Returning a
zval containing the option data fixed the problem, but Andy didn't know
if there had been a good reason not to do so in the first place. Could anybody think
of one?
Apparently, nobody could.
Andy followed up a day later noting 'a further defect';
ASSERT_CALLBACK is always NULL until the first call to
assert(), regardless of the setting in the assert.callback
INI directive, following a reworking of OnChangeCallback() in an
attempt to fix a crash bug recently. Andy's solution was to add a RINIT
function to assert.c to populate ASSERTG(callback) as
appropriate, and call it from basic_functions.c:
PHP_RINIT(assert)
(INIT_FUNC_ARGS_PASSTHRU);
|
Andy posted his new patch, and offered the updated assert tests as a bonus following its approval.
Short version: Still waiting for that approval; the patch is in PAT.
REQ: Dropping support for Windows 98/ME
Andi put in a request to dump support for Windows 98 and Windows ME 'from this point onwards'. He reasoned that support for those platforms means staying with the old, inefficient Windows API; it didn't make a lot of sense to do this in PHP, given that Microsoft itself dropped support for both platforms six months ago. PHP users unable to upgrade their OS should stick with earlier PHP versions; 'they aren't getting Windows updates, so why should they get PHP updates?' Unless anyone disagreed, Andi would like to make this effective immediately.
Nobody disagreed, although William A. Rowe warned darkly of forthcoming 'hoots and howls'; Win98SE boxes are surprisingly long-lived, and PHP user Lester Caine mentioned that 'large parts of the world' are stuck with hardware that can't support anything better. Wez Furlong, who blogged about the probability of this happening some time ago, wrote that although he still gets comments every so often on the blog entry, the argument for retaining support was largely about cost. Given that Linux is free, Wez didn't feel this was a major issue. Frank Kromann thought (along with Lester) that the latest PHP 4 and 5 builds capable of running on Windows 98/ME should be flagged as such and kept available for download. Andi pointed out that all PHP releases are in any case available from the PHP museum, but agreed that there should probably be an explanation and a more prominent link to the museum on the download page.
Short version: There's just time to grab a snapshot before Win98/ME support disappears.
CVS: Caffeinated sessions
Changes in CVS that you should probably be aware of include:
- In ext/soap, bugs #39832 (SOAP Server: parameter not matching the WSDL specified type are set to 0) and #39815 (SOAP double encoding is not locale-independent) were fixed [Dmitry]
fopen_wrapperbug #39850 (SplFileObjectthrows contradictory/wrong error messages when trying to open "php://wrong") was fixed [Tony]- In ext/pdo_pgsql, bug #39845 (Persistent connections generate a warning) was fixed [Ilia]
- Ancient core bug
#30074 (apparent symbol table error with
extract($blah, EXTR_REFS)) was fixed [Brian Shire] - The core gained a new userland function,
stream_socket_shutdown(). This is a wrapper for the systemshutdown()function, itself responsible for shutting down part of a full-duplex connection [Dmitry] - Core functions
inet_pton()andinet_ntop()should work under FreeBSD, now that they're correctly defined there (!) [Hannes] - Iconv bugs #39685
(
iconv()- undefined function) and #38852 (XML-RPC Breaks iconv) were fixed [Hannes] - In PHP_5_2 and CVS HEAD,
MEMORY_LIMITandZEND_USE_MALLOC_MMare now always enabled, increasing maintainability [Dmitry] - Zend Engine bug #39903
(Notice message when executing
__halt_compiler()more than once) was fixed [Tony] - In ext/filter, bug
#39898 (
FILTER_VALIDATE_URLvalidatesetc) was fixed [Ilia] - Across all current branches of PHP, there is now an internal reference to
_SESSION, making it impossible to destroy from userspace [Tony] - In ext/mbstring, bugs #39361 and #39400 (mbstring function overloading problem) were fixed in the PHP_4_4 and PHP_5_2 branches [Seiji Masugata]
- Core bugs #39873
(
number_format()breaks with locale & decimal points) and #36392 (wrong number of decimal digits with%especifier insprintf) were fixed [Ilia] - Reflection bug #39884
(
ReflectionParameter::getClass()throws exception for type hintself) was fixed [Ilia] - Several extensions no longer ignore
--with-libdirin the configure line [Derick Rethans] - In ext/com_dotnet, bugs #33386 (
ScriptControlonly sees last function of class), #37588 (COM Propertypropputrefconverts to PHP function and can't be accessed), #39596 (Creating Variant of typeVT_ARRAY) and #33734 'and related' (Something strange with COM Object) were fixed [Rob Richards]
Last week's shenanigans notwithstanding, Andrei committed support for the
b prefix for string literals in the PHP_5_2 branch. Matt, having a
vested interest in the patch, noticed a couple of potential issues where Andrei had
forgotten to include T_CONSTANT_ENCAPSED_STRING in the coverage. He
pointed this out, and also linked to his own original
patch. Andrei smartly added support to the missed token and thanked Matt,
blaming the omission on 'a brainfart'.
Dmitry triggered some discussion when he raised the default
memory_limit from its recently doubled value of 16MB to 128MB. Pierre
wondered why it should be increased again during the RC phase, and Hannes complained
that 128MB was 'insane' for a default setting. Edin didn't see anything wrong
with it, and definitely didn't see anything "insane" there. Hannes wondered how many
PHP scripts even reach 64MB; he couldn't see a way to use up 128MB without there
being 'a serious memory leak in every single function'. Edin pointed to large
file manipulation:
$text
= file_get_contents
("my 30 meg text file i want to
manipulate.txt");
|
and Derick mentioned 'command line data munging'.
Dmitry, meanwhile, responded to Pierre's initial question with the explanation
that, since PHP is now always compiled with --enable-memory-limit, the
default limit had needed a major increase. Hannes couldn't see why enabling
the memory limit should dramatically increase memory usage, but Wez pointed out that
those previously running without memory_limit enabled could be impacted
by the new restriction if it was too low. The value needed to be something that
wouldn't break most scripts, but would prevent runaway memory allocation.
Hannes was concerned that, by changing php.ini-recommended, the message to
the users would be "we expect the normal PHP script to take up to 128MB RAM". Ilia
retorted that this wasn't about sending a message; it was about preventing breakage
in existing applications.
Over in the steamy jungles of CVS HEAD, Andrei kept himself busy bringing Unicode
support to parse_ini_file(), get_cfg_var(),
set_include_path() and the *printf() functions (with Tony
and Matt's assistance as noted). His commit message for the
ezmlm_hash() upgrade read:
# 50% done, ladies and gents! http://www.php.net/~scoates/unicode
Andrei went on to upgrade get_browser() (returning an
array of browscap values as IS_STRING), metaphone()
(binary runtime-encoded strings only) and number_format() (no
surprises). He also ported the natural comparison algorithm to support
UChar strings - not as easy as it sounds - before ending his week by
introducing a new internal INI handler, OnUpdateUTF8String(). There is
no way to check the string type within the handler, so it always assumes
UTF-8 input.
Pierre completed Unicode support for ext/zip during the week, and Seiji synched the PHP_5_2 branch of ext/mbstring to CVS HEAD ready for whatever changes it may need there. The extension will be retained in PHP 6 to provide back compatibility.
Marcus Börger did some twiddling with the Zend Engine. He implemented a missing
function, zend_u_call_method(), before adding the x
signifier to the parameter parsing API. x can be used for unknown
string types (c = UG(unicode) ? 'u' : 's';). Marcus went on to
implement zend_zstrndup() and zend_ezstrndup(), and wound
up his week with a massive commit making custom seralization work with Unicode
strings, following discussion with Andrei.
Short version: ext/com_dotnet is getting some TLC, and CVS HEAD is more than halfway done!
PAT: Christopher Jones is on the case
Dmitry applied a fix for FastCGI bug #39869 (safe_read does not initialize
errno) in CVS HEAD and PHP_5_2 branch. The patch was supplied by the
bug reporter.
Oracle's Christopher Jones was busy again, proving updated settings for the
distributed INI files. Edin applied his changes. Later in the week, Tony committed
more tests from Christopher, plus his support for the CALL statement
type, in the oci8 extension.
Matt retained his status as 'patch king' this week in many areas (see above),
including a backported minor ext/date optimization to save
strlen() calls in date_format(). Ilia applied it (or at
least, some of it) as part of broader changes.
Nuno removed some pointless code in ext/json, attributing the cleanup to Ron Korving. Frank Kromann offered up a Zend Engine patch that also affected the json extension; his fix allows ext/json in CVS HEAD to be compiled as a shared object under Windows. Tony later applied Frank's code - again, as part of a larger patch.
One Peter Hodge mailed internals@ with a patch to clarify the error triggered
when __get() is called recursively during an attempt to access a class
property. The current message reads "Notice: Undefined property: ...".
Peter's patch alters this to an E_WARNING, "Recursive call to
__get() trying to read property: ...". There was no response to his post, so
the patch is currently sitting in the PAT
directory awaiting review.
Short version: Most stuff went straight in; Peter's error-changing code (ZE) and Andy's assert_options patch need looking at.

Comments