Categories


Loading feed
Loading feed

Zend Weekly Summaries Issue #361


TLK: T_IMPORT vs T_USE
TLK: Compiled variables and backpatching #2
TLK: Square brackets
TLK: Solaris and getcwd()
TLK: Class posing
RFC: VS 2005 support
BUG: import NAME conflict
TLK: Exceptions in autoload
TLK: Taint support: first results
CVS: Getopt in, ereg out of the core
PAT: Missed one (or two)

30th September - 6th October 2007

TLK: T_IMPORT vs T_USE

Sebastian Bergmann can remember the heady days of PHP 5.0-dev, when the original namespace implementation in PHP was attempted and thrown out. When that implementation was withdrawn, claimed Sebastian, the T_NAMESPACE and T_USE tokens were retained for reasons of forward compatibility. (Read: namespace and use have been reserved words ever since.) Sebastian now wrote to the internals list to recommend that, rather than introducing the new token T_IMPORT, the current namespace implementation should be altered to use T_USE.

Johannes Schlüter corrected him - T_NAMESPACE wasn't retained, although T_USE was. He'd investigated usage of import the easy way, using Google's codesearch facility; he search term "->import(" returned roughly 300 results. The public codebases using import include several major PHP applications - Horde, Tikiwiki, TYPO3 and WordPress among them. Johannes therefore backed Sebastian. If the change could be agreed, he offered to write the patch.

Andi Gutmans wanted to stay with import, if possible. He believed that people generally would expect use to include files. That said, he recognized the problem, and pointed out the same issue probably arises with namespace. What Andi really wanted to do at this point was investigate whether it is possible to support reserved words as identifiers, and whether doing so would make sense. For now, he asked that T_IMPORT be left as it was; 'We're still far enough off from release that we don't need to finalize today.'

Greg Beaver and Stas Malyshev immediately started thinking about how it could be done - Greg actually produced a mostly-working solution later in the week - but Sebastian Nohn pointed out that PHP users generally would care less about the appropriateness of the term than about it breaking their software. He pointedly added serendipity and the Zend Framework to Johannes' list of public code currently using import. Andi responded; he had no intention of breaking any software, much less the ZF; he just wanted to see if this other approach was feasible. If not, it would have to be use.

Mike Ford completely disagreed with Andi over the connotations of the word import. He felt that use implies 'something that's already lying around ready to be used', whereas import would go and fetch something. However, he conceded, 'in the global polyglot marketplace, this argument may not have much force'.

Short version: The naming of 'import' isn't important enough to justify the pain.

TLK: Compiled variables and backpatching #2

Paul Biggar returned. He thanked Stas for his explanation of compiled variables, but still had no idea what "backpatching" might be. Could someone please oblige?

Stas did so, but since the term isn't generally used by the PHP development team this is all a little esoteric. He believed that "backpatching" describes the act of updating opcodes for parsed code to give them awareness of something that couldn't be known at the time of parsing. For example, in an if-else statement, when the if condition is false you need the else part, but that hasn't been parsed yet. To get around this, the opcode created when if was parsed is updated on parsing else. Stas added, rather enigmatically, that backpatching is also used to compose variable expressions, e.g.

$foo[$bar]->x->$y->z;

Short version: So now you know. (If you followed that, anyway.)

TLK: Square brackets

Alexey Zakhlestin had discovered the PDM notes for PHP 6 from way back when. The section that particularly caught his eye was the one about square brackets:

"For both strings and arrays, the [] operator will support
substr()/array_slice() functionality"

Was this behaviour likely to appear in PHP 5.3?

Ilia Alshanetsky thought not; it hadn't been on the table for discussion. Besides, he personally felt it was 'a bit too much magic'. Tony Dovgal agreed; 'too Perl-ish for me'. Stas simply saw no point in it; 'since we do have substr()/array_slice() there's no need to overload the [] operator'.

Andrei Zmievski wrote that it was on his TODO list.

Martin Alterisio immediately asked how it would impact SPL's ArrayAccess and related interfaces and objects. Would there be an interface to the new functionality, and if so, how would ranges be passed to it? Would it be consistent with substr() and array_slice(), if used alongside ArrayAccess? Stas wrote, rather smugly, that this was exactly the problem with such syntax. Alexey thought it could be made to work with ArrayAccess, but would be slow; the requested elements would have to be queried one by one and then combined into an array. He also thought that adding an interface would solve the problem, and that ranges could be passed in the same way as with the [] operator. Martin replied that, in that case, rangeSet() shouldn't have a type-hinted third parameter, since that argument might well implement ArrayAccess.

Mike Ford slapped Tony down for throwing out a potential feature simply because it reminded him of another language. It took a while for Tony to convince him that 'too Perl-ish' is simply shorthand for 'too cryptic and makes no sense because it duplicates already implemented functionality (more than one way to do it, yeah)'. That apart, Mike's main complaint was that Tony was arguing against 'a firm decision of an eminent group of PHP core developers' who were 'committed to implementing this much-needed feature for PHP 6'. Tony pointed out that most of the eminent developers had in fact changed their minds since that meeting, as indeed they had about several of those firm decisions, which were never firm decisions in the first place. Tony wasn't just prepared to argue; he would do all he could to block 'such a useless feature' - it was just another syntax alias for substr(). Stas wondered aloud what Mike "needed" it for, given that the functionality itself already exists in PHP.

Andrei saw this whole exchange as an exercise in double standards, particularly since Stas happened to be involved in a deep discussion about 'a very esoteric feature' - class posing - at the time. He didn't see how the mention of square brackets justified a knee-jerk reaction. Stas retorted that it was OK to discuss anything related to PHP on the internals list; 'and if you like me to be equal-opportunity knee-jerk-reactor, here it goes: I don't think both of these two are really needed :)'.

Larry Garfield wanted to know if an ArrayAccess object works with array_slice() currently? He reasoned that if it does, [x, y] would be purely syntactic sugar. If it doesn't, [x, y] would be a powerful new feature. Alexey confirmed that it doesn't, but Derick Rethans argued that it wouldn't actually make any difference whether it works with array_slice() or not. Marcus Börger clarified this; ArrayAccess was designed not to work in array functions. It should support array syntax, though, so it will support [x, y] if the feature goes into PHP. However, Marcus added, he found that kind of slicing 'too Perl-ish' too.

Alexey, having seen 'Perl-ish' explained multiple times by now, wrote that he didn't find the syntax cryptic at all - in fact, it appeared to him to make some array algorithms more readable. Besides, it would make userland implementations of array_slice() possible... Tony did his best to see this as a genuine need, but failed miserably.

Short version: Too Perl-ish.

TLK: Solaris and getcwd()

One Rob Thompson turned up on the internals list, hoping to resolve a PHP bug he was seeing under Solaris. In particular, he wanted to know why the PHP function getcwd() sometimes fails. This occurs when a path component has no read permissions, and appears to be a security feature in Solaris itself. Rob had found that to get through this barrier directly on the system you'd need to either execute a suid-root getcwd() or (wait for it) "tell Solaris you already know where you are". The way to achieve this was by changing directories and using a fully-qualified path; if the attempt failed, the library getcwd() would return NULL.

Having gained everyone's sympathetic attention with that last piece of information, Rob went on to ask questions. Firstly, he wanted to know, is there any way for a non-root instance of PHP to "know where it is" in the directory tree, which would make the directory-changing remedy possible? Secondly, could anyone confirm that PHP's include() requires the POSIX library's getcwd() in order to manage relative paths? Thirdly, assuming that the former is possible and the latter correct, could the Solaris workaround be used to make PHP's getcwd() work?

Tony braved the deep waters. For PHP to "know where it is", it would need to call getcwd(), which plainly doesn't work on Solaris. However, Rob was correct in assuming that the POSIX getcwd() - or an equivalent appropriate to the system - is needed in order for relative include paths to work. The problem with the Solaris recommendation was that it required you to know where you were prior to the chdir() call. To know this, you would need to call getcwd()... Rob went away and read some more. Much later, he confirmed that this was 'a "chicken or the egg" issue'. Worse, 'with Solaris, there isn't even any chicken'.

New question. Would a sensible approach be to check for a NULL getcwd() return value, replacing it with . (AKA 'you are where you are, wherever that is') for the PHP getcwd() call? Rob believed this would resolve the problem of relative include paths, but wasn't sure of the security issues arising from the solution. He later offered a patch to achieve this, but Tony gently advised him to look at the TSRM module, where the real work takes place, rather than trying to fix it in the stream wrapper.

Short version: Solaris... gotta love it in order to live with it.

TLK: Class posing

Sebastian Bergmann had been reading books again, this time about something called 'posing' in Objective C. Basically, a class can completely replace another class within an application; the replacement class poses as the target class, and will receive all messages intended for the target class. There's a single restriction pertinent to PHP; 'a class may only pose as one of its direct or indirect superclasses'.

Earlier in the year, Johannes apparently implemented class posing for PHP as a proof-of-concept exercise. Having seen that it can work, Johannes, Marcus, Sara Golemon and Sebastian had discussed where to put the functionality. There was a choice of deserving PECL extensions - operator or runkit - but to make it viable for projects like Sebastian's PHPUnit, it would need to be in the PHP core. Sebastian could definitely use it, and would love to see it in PHP 5.3. What did others think?

Guilherme Blanco was sure he could find a use for it too. However he thought it might be better implemented using a magic method - something like __new() - rather than by using a function to handle overloads. Guilherme reasoned that this would free the class of inheritance issues, making it better suited to super and extended classes. Strangely enough - oh perhaps it isn't strange at all - __new() had been Sebastian's initial proposal to Johannes, but there had apparently been some performance implications with that approach.

David Zülke wanted to know if it would be possible to overload final classes? Sebastian replied that it would; the compile-time class declaration isn't affected. Class posing works by intercepting run-time object creation.

Richard Quadling and Stas both commented that Sebastian's code 'looks like a factory pattern', and asked why he couldn't implement it in standard PHP? Stas disliked the introduction of 'very "magic" things', objecting that it would be impossible to know which class was being instantiated by a call to new Foo(). In his opinion, class posing belonged firmly in PECL. Sebastian explained that factory and singleton patterns were simply examples of class posing at work; the reason he wanted to use them in PHPUnit was to improve the mock objects system there. For example, given:

public function foo() {
    
$bar = new Bar;
    
$return = $bar->doSomething();
    
// do something with $return
    
return 'some value';
}

he would like to be able to 'stub out' the Bar class and have Bar::doSomething() return a pre-configured value. At present, he can't pass a stubbed version of the Bar class into the method, so there's no way to do that. Using class posing, he could override the call to new Bar() and have it implement the stub class.

Stas wondered how Java's unit tests manage without allowing class replacement. He wasn't convinced that allowing it in PHP would be a good solution, and suggested that Sebastian look to see how unit testing is achieved without this feature in other OO languages. Sebastian pointed out that stubs and mock objects aren't the same thing as unit tests; they are simply tools that allow better unit tests to be written. Stas asked if any of the known unit test systems use them, and if so, how they are implemented. Jared Williams believed they usually rely on the dependency injection pattern, and produced links to both a Java implementation with a PHP port and a more lightweight PHP implementation. However, the cost of setting them up is expensive, in terms of implementation registration, reflection performance etc. Stas thanked Jared and promised to look into them; 'I am extremely uncomfortable with an Engine change that would allow "new Foo()" to produce an object that is not Foo.'

Timm Friebe suggested refactoring the source code. Not being able to do so was rare, in his experience. That said, if Sebastian really needed to intercept construction, he could write a file:// stream wrapper or filter to intercept calls to include and require, and replace calls to new Foo() with newinstance('Foo') in the source.

Sebastian argued that you can't refactor third-party code to comply with a given pattern; class posing becomes an essential item at that point. Besides, new Foo() could only produce objects that are in an is_a relationship with Foo due to the restriction he'd mentioned earlier, allowing typehints to continue working. Reflection and __autoload() would continue to work without special considerations.

Arne Blankerts pointed out that that restriction actually conflicts with Sebastian's earlier statement that it's possible to override final classes. He suspected it would also 'cause serious trouble with anything marked private' in the original class. Sebastian corrected himself; a class may pose for a target class that contains final methods. The posing class itself would be unable to override final methods of its parent. Marcus agreed with Arne that final should always be respected, and wrote that 'we need to investigate further' concerning the behaviour of private.

Johannes caught up with the thread, and explained that the implementation Sebastian had mentioned had been 'simply a conference hack' to see whether class posing was possible and allow Sebastian to test it; 'there wasn't much thinking involved'. (Heh.) In an extension - which this implementation is - the best approach Johannes had found was to use registration. A core implementation would allow a call to __new() to be cached during the declaration, which would be better in terms of performance. However, given that performance isn't much of an issue in a test environment, the question was whether the feature is actually needed in the core or would be best implemented as an extension? Johannes' own feeling was that installing an extension should present no problem for anyone able to perform unit tests using mock objects. He also wondered about the feasibility of taking 'the code coverage stuff' available in Xdebug and the ZEND_NEW overloading implemented here, along with some other bits and pieces, to create a phpunit extension. However, this was hardly a topic for discussion on the internals list...

Short version: Watch out for PHPUnit arriving soon in a PECL near you (maybe).

RFC: VS 2005 support

Pierre-Alain Joye raised the subject of dropping support for the Microsoft Visual Studio 6.0 compiler in PHP 5.3. Although it would have 'a couple of side effects', this would be a one-time job that would make our lives easier when dealing with Windows ever after.

Richard Quadling immediately asked if it wouldn't be better to target the MSVS 2005 Express Edition, which is a free-as-in-beer compiler. Daniel Brown wondered which side effects Pierre anticipated? He also backed Richard's point about targeting the Express Edition, but didn't know what impact this might have in the long term.

Rob Richards recommended stringent testing. He'd recently run into an issue when running an application built with VS 2005 and using DLLs built with older compilers - the runtime linking is different - and PHP has an awful lot of third party DLLs to consider. The particular issue Rob had found was when the VS 2005-built application created a pointer to a file using fopen(), which was then passed to a DLL built with an older MSVS version, which then called fwrite(). The ensuing crash was caused by the clash of two incompatible runtimes.

Stas noted that non-CL builds aren't supported at all yet, but the PHP build should be able to use cl.exe with any of the targets mentioned so far. Did anything actually need changing for VS 2005? He added that there may be some runtime issues; VS 2005 links with shared libraries that might well be missing from older systems. This would need verification. Marcus suggested either targeting VS 2003 (huh?) or supplying the dependencies alongside PHP.

Andi thought it was worth another go ('another' because PHP's official Windows distributions dude, Edin Kadribasic, tried this some time ago.) In Andi's experience, VS 2005 binaries are 'significantly faster' than VC6 binaries. He added that the Zend team have already tackled some of the issues arising from an in-house upgrade, and would be able to help out with those.

Marcus just wanted to drop 'all the VC6 build files' - presumably referring to the .dsp files used by Visual Studio, rather than the generic CL build system that can be used to build PHP under Windows regardless of compiler version. He would like to have VS 2002 and VS 2003 work as well, 'and VS 2007 is at the door already'. Pierre explained about the build system, and that VS 2003 already works fine without any changes. He would be happy to kill the .dsp files; 'only Stas (or Dmitry?) has given them some love lately'. However, the thing about VS 2005 was that it would require 'a couple of important changes' to the build system, e.g. manifest support. Pierre had never tried the VS 2007 beta, but believed that targeting VS 2005 would be adequate preparation for it.

Nuno Lopes intervened to report that he'd been using VS 2005 to build PHP for quite some time, with no changes and no problems. He'd even built PHP against some of Edin's VC6-compiled third party libraries, again with no issues. That said, he wasn't using his homemade binaries in production - just for debugging purposes. Andi reiterated Rob's earlier point: unless all the third party libraries are also compiled with VS 2005, there is a real chance of problems arising when the data structures in the different runtimes clash. Apache man William A. Rowe noted that it isn't just the data structures; one C runtime (CRT) can have localized resources that aren't visible to others. He cited 'the faux-posix I/O' as a prime example of this.

Turning to the subject of Apache, William explained that the httpd binaries shipped by the ASF are built using VC6, and will remain so for the lifespan of Apache 2.0/2.2. There was a fair chance of their moving to VS 2005 for Apache 2.4, but - given the number of C library issues occurring 'in each iteration' of the compiler - the Apache builds are unlikely to upgrade beyond that any time soon. He added that Perl is still shipped on the VC6 runtime, and Python on the VS 2003 runtime. It would be a game of cat and mouse unless or until everyone moved to VS 2005. The important thing was to clean up the .pdb files so that they would import cleanly; without them it's no longer possible to export .mak build files for use outside the MSVS environment.

Andi mentioned that it's possible to de-couple Apache from PHP by simply using FastCGI. Marcus wrote that MS also recommend that approach. He went on to echo William's point that struct sizes aren't generally an issue in the Windows API, so much as 'the POSIX stuff' and new functions that don't exist in the older runtimes. Memory allocation is a particular problem; you have to bind statically, and blocks malloc'd in one module can't be freed elsewhere. William replied that it's possible to work around the allocation issues, so long as the modules are well partitioned and have full responsibility for freeing their own memory. However, one thing that would trip up any project was binding tightly to third party libraries. OpenSSL, in particular, can create 'a mess all its own' if compiled to use a different CRT.

Short version: Spot the missing developer.

BUG: import NAME conflict

Benjamin Schulz reported that:

import Foo::Bar as DomDocument;
import Foo::Exception;
import MyStuff::Dom::XsltProcessor;

resulted in a fatal error, "Import name '...' conflicts with defined class". He was somewhat bewildered by this; naturally he'd like to be able to refer to Foo::Exception as simply Exception within his own application. He wouldn't be using a namespace there otherwise. Was there some good reason for this behaviour?

Greg replied that this works fine so long as you import the global class too:

import ::Exception as Notused;
import Foo::Exception;

He agreed that this wasn't very intuitive, but at least it was simple. Stas mentioned that he wouldn't recommend such code, but didn't really have an alternative to offer. Markus Fischer pointed out that you wouldn't necessarily know the names of global classes in third-party libraries in advance anyway.

Benjamin wrote bluntly that Greg's simple solution made the entire concept seem broken. Greg retorted that import is best used within a namespace, and if Benjamin just did that there'd be no need to import global classes. Marcus wondered how Benjamin was defining his Exception class? Something like:

class Exception extends ::exception { }

would be pretty ugly; besides, that class replaces a core functionality. Why didn't he just use the built-in Exception class, if his own version was so general that it needed that name? Extended Exception classes should be specialized anyway, in Marcus' view. Benjamin pointed out that the problem was nothing to do with exceptions per se, it was more about global class names that he didn't want to know about in his own namespaced code. Did he really need to rename his namespaced Exception class GenericException? How about GenericXsltProcessor? What about future SPL classes? What happens when a PECL extension declares a class in global space that hasn't even been thought of yet? This implementation guaranteed that future PHP releases or extensions would break even applications that use namespaces.

It took an embarrassingly long time for Benjamin to get his point across, even with Markus and Moritz Bechler's assistance, mainly because he'd made the mistake of using an inherited Exception class to illustrate the problem. Poor lad.

Greg realized a few days later that 'Benjamin has in fact unearthed a bug in the implementation of import', and explained it all over again... still using Exception:

namespace Foo;
import Blah::Exception;
$a = new Exception;

should be equivalent to:

namespace Foo;
import Blah::Exception as Foo::Exception;
$a = new Foo::Exception;

but wasn't. Greg also posted hints for a fix, as he didn't have time to put the patch together to fix it himself.

Stas argued about what that piece of code should be equivalent to, but you could tell his heart wasn't in it by the way he ended up agreeing with Greg. Thankfully, Stas was able to translate the problem into pure internals-speak: 'unqualified lookups inside namespace should also take imports into account'.

Dmitry still had issues with the example, and asked for another. Greg obliged, and once more explained the fix. Later, he decided it would be quicker to put both the example and the patch in a bug report than keep explaining both. Dmitry double-checked, and confirmed that both the bug report and the patch appeared correct. He promised to give them closer attention, and thanked Greg for his efforts.

Short version: The PEBCAK that wasn't.

TLK: Exceptions in autoload

Moving swiftly on from there, Greg asked if anyone had a link to the archive where it's explained why the executor is unstable after an exception is thrown in __autoload(). Marcus didn't recall any particular thread, but did recall Andi explaining why some things are unstable when exceptions are pending. Much had changed since then, and Marcus thought now might be a good time to re-investigate the issue. However, if it meant ending up with a Java-like exception stack, he'd rather have PHP's current behaviour. Greg agreed; you can get around the problem of a fatal error arising from an uncaught exception in the current setup, to some extent, by using die(new Exception(...)). He was just hoping to understand __autoload() internals a little better.

In fact, Greg was hoping to avoid using that rather dodgy die() trick. He was offering (in another thread) a patch that introduced a new function, in_class_exists(), which would return TRUE if __autoload() were called by class_exists(). It would allow his autoload handler to return if the existence of a class were queried, and die with the appropriate exception where an E_ERROR would normally result.

Marcus was bemused; class_exists() should simply return FALSE on failure. Why make it more complicated? Was Greg hoping to avoid the time taken to load and compile the class? Wouldn't he need to do that at some point anyway, if the class existed? Besides, it's possible to avoid a call to __autoload() altogether by calling:

class_exists($classname, false);

Greg explained the scenario he was trying to deal with:

new PEAR2 user Joe User downloads PEAR2 package Blah, which depends
on package Foo, but does not download Foo/something happens and Foo
is erased accidentally/whatever.

Joe, not knowing anything about PEAR2 package Blah, is just trying
it out to see how it works, and so does the drill of:

<?php

include '/path/to/PEAR2/Autoload.php';

$a = new Blah;
$a->doSomething();
$a->doSomethingElse();

?>

This script results in: Fatal Error: class PEAR2::Foo not found in
/path/to/PEAR2/Blah/SomeinternalClass.php on Line XX

Not very useful. PEAR2 knows what the problem is, and also where Foo should be, but can't pass along that information to Joe since there's no way to safely pass error information out of __autoload(). Basically, __autoload() just isn't very helpful when it comes to debugging, and adding the ability to check whether a call to class_exists() is the source of the autoloading would go a long way to resolving that problem. Simply not allowing class_exists() to call __autoload(), on the other hand, would not.

Johannes suggested that

function __autoload($a) {
    
$bt = debug_backtrace();
    if (
$bt[1]["function"] == "class_exists") {
        echo
"in get_class";
    }
}

should work quite well, without the need for obscure engine hacks. Greg explained that he does that already. He saw it as 'an ugly, unnecessary hack with lots of potential pitfalls caused by the inability to customize an error message when a class doesn't exist'; it also has a nasty performance hit. Johannes doubted that performance is important in an error handler; besides, he reckoned the performance issue there was a result of the search through the file system anyway. Marcus backed him; after all, the message is only shown when the target class is not present.

Short version: Best investigate the stability issue then.

TLK: Taint support: first results

Wietse Venema, who proposed taint support for PHP almost a year ago, wrote to internals@ to update the development team on his progress. Although he'd had to adapt his original plan somewhat, he now had an initial implementation that adds taint support to the core, a selection of built-in functions and a couple of extensions. The good news was that performance was much better than anticipated; Wietse reported an overhead for make test in the 1% - 2% range. He planned to release his work for review in the near future, since feedback was now needed. Although rough at present, the code already had the potential to manage such tasks as labeling sensitive data.

The taint implementation is controlled by a single INI directive, taint_error_level, which is an INI_ALL setting. Setting it to, for example, E_WARNING would make this script:

<?php

$username
= $_GET['username'];
echo
"Welcome back, $username\n";

?>

output:

Welcome back, xxx
Warning: echo(): Argument contains data that is not converted with htmlspecialchars() or htmlentities() in /path/to/script on line 3

The directive would of course be switched off by default. Taint mode aims to be context sensitive, offering up advice about escapeshellcmd() and mysqli_real_escape_string() where appropriate. This is achieved by adding binary properties (bits) to the unused areas of the zval struct. The bits currently are named TC_HTML, TC_SHELL, TC_MYSQL, TC_MYSQLI, TC_SELF, TC_USER1 and TC_USER2; there is room for at least another 16 bits, assuming a 32-bit compiler. These bits are set internally, using taint_marks_* or taint_checks_* parameters as appropriate. (This part wasn't very clear to me either - it looks as if new internal macros are used.) There is no interface for the TC_USER* bits at present, but the plan is to make them available at application level. Wietse went on to explain the propagation rules (conversions from integer to string and pure arithmetic or string operations retain all taint bits; conversions from string to integer remove most of them; comparison operators ignore them) before covering the problem areas he'd already identified. These included functions like parse_str(), and the problem of empty strings. Also, support for tainted objects is not yet complete, and object-to-other type conversions, in particular, may lose taint bits. Finally, Wietse warned that those areas of PHP (and extensions) that don't use the correct macros to initialize zval structures are likely to be problematic when taint checking is turned on, since they will leave taint bits at uninitialized values.

David Wang was a little twitchy about those spare zval bits; he uses three of them in his garbage collection patch. He agreed that there is room for a lot of free bits, but thought Wietse should be aware that increasing the size of the zval struct leads to L1 cache misses. Wietse copy-pasted the paragraph he'd written about the 16 extra bits, and explained that he'd found micro benchmarks overly processor dependent - hence his choice of macro benchmarking.

Marcus wrote that he liked the INI approach, and of course the benchmark results. He was less certain about the database specificity; if it couldn't be avoided, having 'something for PDO' would be a good move. He didn't like the name TC_SELF for something that checks the source of calls to internal control operations such as eval(), and suggested that TC_PHP might be more appropriate. Regarding the evidence of poor macro usage, Marcus mentioned that David had encountered the same problem. Marcus had come to the conclusion that macro usage should be enforced, and direct access to zval members disallowed, by evil means (new zval member prefixes to break non-compliant code). The nice way to do it is too slow...

PHP user Laurent Jouanneau pointed out that a PHP application can generate several kinds of output other than HTML: JSON, CSV and PDF, to name but a few. How could Wietse's code guess the output type? and if it couldn't, how could the warning be disabled where it was inappropriate? Wietse wrote something flippant about not using echo. M. Sokolewicz called him on it, and gained the explanation that the code to create PDF output doesn't have taint-labeled data; the labels actually need to be put there at the point of data creation. Rasmus Lerdorf commented that this didn't make much sense to him, and gave Wietse a much-abbreviated chunk of common-enough PHP code to consider:

$user_data = $_REQUEST['data'];
switch(
$output_format) {
    case
'html':
        echo
"<html>$user_data</html>";
        break;
    case
'xml':
        
header('Content-type: text/xml');
        echo
"<xml>$user_data</xml>";
        break;
    case
'json':
        
header('Content-type: application/json');
        echo
json_encode(array($user_data));
        break;
}

'$user_data is tainted, but the untainting rules are very different for those three cases, and... an error that talks about HTML escaping only makes sense in the html case.'

Wietse wondered where 'the output format feature' was documented. Rasmus educated him. It didn't take too long before Wietse asked about the Content-Type header. Rasmus agreed that this would work in most cases, but added that output buffering would break it, since echo doesn't output anything before the output buffer is flushed - and it might never be flushed. Wietse acknowledged that the practice of setting Content-Type immediately prior to flushing the buffer would be incompatible with taint checks. It would also be 'prohibitively expensive' to apply taint policy to the contents of the output buffer; you'd need to record which function, argument, file and line each byte of data came from, as well as its taint labels.

Stut suggested that taint should simply assume HTML but provide a way to specify otherwise, either with a specific function call or via ini_set(). Wietse felt it best not to overburden the interface with switches and functions; he preferred to rely on header information to pick up the MIME type. 'I just need to hook into the header() function and do a little parsing', he wrote, thanking Rasmus for his explanation. Stut meanwhile was trying to think of a situation where the tainting might get in the way of determining the requested output format, but couldn't come up with any.

Greg wondered if TC_SELF would be applied to stream data. Wietse explained that it wouldn't by default, but could be configured that way. For the time being, only data from the Web is treated as hostile; all other external data simply needs to be escaped when used in HTML, shell or SQL. In case anybody wasn't clear on this point, he added that the current taint bits and marking policies are simply a first step; they are liable to change as Wietse becomes more aware of common practices in PHP.

Short version: If you're planning to test this, be aware that it's not pretty just yet.

CVS: Getopt in, ereg out of the core

Changes in CVS that you should probably be aware of include:

  • Zend Engine bugs #42798 (__autoload() not triggered for classes used in method signature), #42802 (Namespace not supported in typehints), #42819 (namespaces in indexes of constant arrays) and #42820 (defined() on constant with namespace prefixes tries to load class) were fixed in 5_3 and HEAD [Dmitry]
  • Core bugs #42789 (join() warning messages are not proper and different return value in PHP 5/6) and #42142 (substr_replace() returns FALSE) were fixed [Jani]
  • getopt() is now available on Windows, following its move from SAPI level to the PHP core in 5_3 and HEAD. It also now supports long options. [written by David Soria Parra, committed by Jani]
  • PHPAPI function php_prefix_varname() is now available in the PHP_5_3 branch (affects internals only) [Jani]
  • In ext/json, bug #42785 (json_encode() formats doubles according to locale rather then following standard syntax) was fixed in 5_2, 5_3 and CVS HEAD [Ilia]
  • In ext/xsl a new method for profiling stylesheets, xsl->setProfiling(), is available in the 5_3 branch and CVS HEAD [Christian 'Chregu' Stocker]
  • Core bug #42752 was fixed in 5_2, 5_3 and HEAD following improvements to the recursion detection in array_walk() (memleaks remain.) [Tony]
  • In CVS HEAD, strcspn() now behaves the same way in both Unicode and native mode, fixing bug #42731 [Tony]
  • Zend Engine bugs #42772 (Storing $this in a static var fails while handling a cast to string) and #42818 ($foo = clone(array()); leaks memory) were fixed in PHP_5_2, PHP_5_3 and CVS HEAD [Dmitry]
  • The internal function php_fgetcsv() gained an escape parameter in PHP_5_3 and CVS HEAD, closing bug #40501. This change impacts core function fgetcsv(), and the SplFileObject methods fgetcsv() and setCsvControl(). The default setting in all cases is ' \\'. [David Soria Parra]
  • \u, \U and \C are no longer supported in single quotes in CVS HEAD, closing bug #42746 [Tony]
  • In ext/pgsql, bug #42783 (pg_insert() does not accept an empty list for
    insertion) was fixed in 5_2, 5_3 and HEAD [Ilia]
  • Zend Engine bug #42817 (clone() on a non-object does not result in a fatal
    error) was fixed in 5_2, 5_3 and HEAD [Ilia]
  • lcov 1.6 is now officially supported in 5_2, 5_3 and HEAD [Nuno]
  • The core regex functions dating back to PHP 3 (ereg[i](), ereg[i]_replace(), split(), join() and sql_regcase()) were moved to their own extension, ext/ereg, from 5_3 [Jani]
  • In ext/ldap, ldap_set_option() gained two new possibilities, LDAP_OPT_NETWORK_TIMEOUT or (for the Netscape LDAP SDK) LDAP_X_OPT_CONNECT_TIMEOUT in 5_3 and HEAD, fulfilling feature request #42837 [Jani]

In other CVS news, Pierre formally crowned our new RM; he gave Johannes write access to all relevant cvs.php.net modules.

Short version: A stupidly busy week.

PAT: Missed one (or two)

David Soria Parra had a bit of a week of it, with two successful patches going into CVS as reported above and a third, unsuccessful patch nonetheless leading to a Zend Engine fix.

One Bill Moran notified the list with the information that he'd added a one-line fix to the report for bug #42637 (SoapFault: Only http and https are allowed), and hoped to see it checked in before the next PHP_5_2 branch release. As you'll gather from the length of this summary, this was a pretty hectic week on internals@; Bill's patch escaped list attention.

Rui Hirokawa continued the world's most drawn-out conversation - did you ever see that Red Dwarf episode where the mail pod arrives three million years late and one of the parcels contains a video of a chess move? the first move? - actually, from this post it looked like Rui had reached a possible solution to the _HALT_COMPILER() problem in ext/mbstring. He offered to disable detect_unicode by default, assuming there were no objections. We might have to wait a while for François' reaction, though.

Short version: Bill's patch - and one from Greg earlier - didn't get any feedback.

Comments


Saturday, December 15, 2007
JOIN IS OUT?
11:09AM PST · bweirdan
Monday, December 17, 2007
JOIN()
12:58AM PST · sniper