BUG: RTLD_DEEPBIND [continued]
TLK: More strict than Liskov
TLK: spl_object_hash docs
TLK: Date Wars revival attempt
BUG: 301 redirects
TLK: allow_url_include and php:/data:
CVS: Starting out with 5.2.1
PAT: Scalar type hinting

BUG: RTLD_DEEPBIND [continued]

RedHat contributor Joe Orton, who was responsible for introducing RTLD_DEEPBIND support into the PHP core, picked up on last week's exchange and immediately asked Brian France the obvious question: 'What problems do you see?'

Brian's reply was less than comforting: 'Where to start?' It seems that the Yahoo! team link their PHP extensions against a homegrown static library, apstubs. This library defines all exported Apache elements as 'weak'. If a PHP extension written for Apache were run under Apache, the 'strong' symbols from the server library would override the 'weak' ones in apstubs, but those weak symbols would be utilized if the extension ran under the CLI SAPI. RTLD_DEEPBIND forces PHP to use the weak apstubs symbols even when the strong Apache symbols are present and, while Brian perfectly understood the reasoning behind this in PHP code generally, this broke the behaviour the Yahoo! crew rely upon.

A second, more generic issue the team had found was that, when RTLD_DEEPBIND is defined, the rtld couldn't find the symbols from a library that had already been loaded. Brian was having trouble getting to the root of this issue, but guessed that libraries were not being unloaded between the first and second pass of Apache startup. Turning off RTLD_DEEPBIND prevented the ensuing core dump, but not the unloading issue itself. There was a related bug report open against RedHat, and Brian was about to test whether removing RTLD_DEEPBIND resolved that too. He added that there would be either a new or updated RH bug later, should the PHP or FreeBSD kernel teams at Y! find anything solid.

Short version: The first doesn't affect PHP generally; the second, might.

TLK: More strict than Liskov

Christian Schneider was bemused. Some weeks ago, he supplied an OO strictness patch to prevent PHP complaining about added default parameter values or changes to static methods that do not break the Liskov substitution principle. He had had no feedback over this, at all. Why, Christian wanted to know, had his patch not been considered?

Marcus Börger explained that the Liskov principle applies to static methods when calls via objects are common, which is the case with PHP. As with many other languages, PHP's static methods are inherited in the same way as any other method. Adherence to Liskov rules would mean that PHP users could add default parameter values, add new parameters with default values or change type hints in derived class methods. C++ had already proved that allowing users to alter default parameter values was not a good idea in a compiled language; it resulted in unexpected default values. Although it had a better chance of working as advertised in PHP, allowing it would confuse programmers coming from C++, and could prevent optimizations in the future if PHP ever adopted the C++ idea of compile time function invocation binding. The same was true when it came to allowing additional parameters; although it could work in PHP, it would be more confusing than helpful. Finally, the ability to change type hints in derived class methods had been decided against at PDM; none of the core team knew of a language that supports it, and its rules are non-intuitive.

Marcus moved his focus to E_DEPRECATED; it still wasn't known whether this would be introduced in PHP 5.3.0 or PHP 6.0.0. The team would appreciate help to make it come about quickly.

Christian wrote simply that he hadn't realized how big an impact C++ has on PHP design, and was very glad they were only discussing E_STRICT! He wanted to know what kind of help the team needed, and offered to put some time into making E_DEPRECATED possible.

Marcus explained that language designers often copy from each other; PHP's own object model was a combination of influences from C++, Java, Delphi, a little C#, maybe some python, and some stuff he couldn't even specify a counterpart for. He'd only mentioned C++ on this occasion because the specific behaviour under discussion could be best illustrated that way, and in his experience most people prefer explanations by example to tracts of theory. However, if anyone happened to prefer theory, he recommended two books: Palsberg and Schwartzbach's Object-Oriented Type Systems ('easy'), and Abadi and Cardelli's A Theory of Object ('a pretty mathematical approach'). Returning to the issue of E_DEPRECATED, Marcus felt that a list of the existing E_STRICT messages would be very useful, and looked forward to seeing it. Christian posted a shell script to produce such a list, along with a small prayer that the core team wouldn't fall into the trap of not seeing the great potential in being different from other languages.

PHP user Jeff Moore thanked Marcus for his 'enlightening explanation', commenting that basically what it came down to was that PHP's E_STRICT rules are intentionally more strict than Liskov. Marcus agreed that this was the case, and referred back to the PDM decision over type hints and non-intuitive behaviour.

Short version: More than you ever wanted to know about OO theory and the politics of language design.

TLK: spl_object_hash docs

Stas Malyshev had come across the new spl_object_hash() function, which he welcomed, but had then discovered that it wasn't documented in the PHP manual. Was there some reason for that?

Hannes was quick to write about the 'human resource problem' for the benefit of anyone passing. Marcus, however, explained that in this case the lack of documentation was simply an oversight. He added that the function simply generates a unique md5 string from an object's id and object handler address.

Short version: Freshly baked docs available here!

TLK: Date Wars revival attempt

Mauro N. Infantino wrote to internals@ with a problem. In his team's PHP 5 code base, there happened to be a class named DateTime. Assuming they were not alone in this, Mauro wondered if there was any chance an INI directive might be added into PHP that would offer the option of disabling the new class definition. He went on to say that they'd had a lot of PHP 4 projects, all of which had migrated to PHP 5.1 with surprising ease, but changing every date creation - and the type hints - would be a nightmare. 'How could it be more difficult to upgrade from PHP 5.1 to PHP 5.2?'

Rasmus Lerdorf explained briefly that far more users had a class named Date, and the PHP internal class name had already been changed to DateTime because of this. Mauro would simply have to change his own class. Ilia Alshanetsky, more sympathetically, introduced Mauro to the concept of prefixing class names, and explained that PHP was likely to introduce more internal classes into its core in the foreseeable future. That said, if there was no option but to disable it, Mauro could edit the PHP source and rename the native class to something like DateTime2.

Christian wrote that he was still wary of the prefix approach; it puts the burden onto PHP users, who write more code than is in the more controllable core and libraries of PHP. He felt strongly that core classes should be namespaced, full stop. Ilia, who believes equally strongly that prefixing is an elementary rule in OO programming, dismissed this argument. The language, he wrote, should always have the best possible names, simplifying development for the majority of its users. However popular an application might be, PHP is more popular; the needs of the many outweigh the needs of the few. Christian partly agreed with him, but pointed out that 'the many' now need to care about naming; this extra rule, in his opinion, makes the language a little less simple to use. Marcus intervened to point out that, while Date, DateTime or Time all are common names, prefixing core class names with Php was just as likely to be problematic. Should the PHP dev team have given the core class a ridiculous name to avoid the problem? Wasn't it easier for users not to have to check the manual every time they wanted to use a core class? Christian was willing to bet that 'the number of PhpDate classes out there is considerably smaller than the number of Date classes'. He believed that having a standard prefix in the core could even allow the team to add an E_STRICT warning when userspace classes with the same prefix were used. His favourite solution would be to have namespaces handling the whole situation, rather than educate users to prefix their own classes. PHP user Kevin Waterson pointed out that, since PHP doesn't support namespaces, the users should be prefixing. Ron Korving argued that it wasn't possible to rely on application developers to do this. He agreed with Christian's view that 'prefixing application classes is a big burden compared to prefixing PHP core classes'. Furthermore, with the rising popularity of the ActiveRecord design pattern, database table names are starting to dictate class names. PHP needs to change, not the user approach; PHP needs namespace support. Lukas Smith pointed out that prefixing is trivial; 'all people need to do is stick a single underscore into their class name. That's all.' As for the ActiveRecord issue, he didn't see a problem with applying a default prefix. Finally, he'd yet to hear any of the PHP core developers say they were against namespace support...

Ilia pointed out that the discussion over what to name the core datetime class had been held in the public domain, and the options on offer had been based on the team's analysis of the class names currently used in application code. A Google search on DateTime had turned up only about eight PHP applications; Ilia felt this confirmed the soundness of the choice. Besides, users don't have to upgrade immediately, and shared hosts are very rarely up to date. PHP user Lester Caine argued that anyone using a shared host would need to fix their code regardless, because a host can update at whim. Those using third party applications wouldn't even know they had a problem until the host upgraded, never mind how to fix it. He felt there should be some mechanism in place to override core class names, simply because of this. Marcus pointed out that exactly the same issues had existed with core function names for years, and wondered why it's suddenly such a big problem when it comes to class names? There is a clear set of rules for core class naming, and Marcus couldn't see how they offered the user any less protection than the pre-existing rules for core function naming.

Ilia and Christian meanwhile argued bitterly between themselves over the rights of the language developer versus the rights of the application developer, with Ilia insisting that 'the language always has the best pick of namespaces... ANY language'. He was still angry over the compromise solution that allowed PEAR to retain the Date class, and hoped this 'horrible mistake' would not be repeated in future. Naturally this brought Pierre-Alain Joye to the defense of the 'horrible mistake'; the problem wasn't the name so much as the introduction of the class into the global namespace, without previous warning, in the final days leading up to a minor release. Things are getting better, in that some developers now need to justify the names they use in their extensions; but it would be better still, wrote Pierre, if this rule applied to all.

Lester observed that all his applications had failed with the core DateTime class because his own are named DateTime too. Ironically, he'd chosen that name because Date was the obvious name for the core class. He still didn't see why it had to be hard coded into the core; the class should just be loadable, 'like any other good extensible language'. In fact, in Lester's view, any time there'd been a BC break the ensuing problems could have been avoided by providing a simple switch to enable the new code only if it was needed.

Short version: Read the userland naming guide in the PHP manual to avoid the possibility of future conflicts.

BUG: 301 redirects

One Ian Evans was having problems redirecting pages, and wrote to internals@ to check whether this was a known issue before submitting it as a bug report. He was running PHP 5.1.4 as FastCGI under lighttpd, and had found that his HTTP 301 redirects were returning HTTP 302 instead. His code read:

header("Status: 301 Moved Permanently");
header("Location: mynewurl");
exit();

and his header checker was returning this:

#1 Server Response: oldurl
HTTP Status Code: HTTP/1.0 302 Found
Connection: close
X-Powered-By: PHP/5.1.4
Location: mynewurl

- keeping his old pages in Google rather than using the new location. Ian added that he'd already tried a variety of permutations in the code, following suggestions on the PHP general list, but nothing was working for him.

Various people promptly suggested a variety of permutations in the code, but Ian insisted that all of them - including the correct version,

header("HTTP/1.1 301 Moved Permanently");

return HTTP 302 under his set-up. Eventually, Edin Kadribasic suggested that this might have more to do with the lighttpd server itself than with PHP - in fact he'd had issues with headers under lighttpd himself. He thought Ian should contact the development team responsible for the lighttpd server code.

Short version: It's not always PHP's fault.

TLK: allow_url_include and php:/data:

Stas had come across a blog entry from Stefan Esser in which the security expert had claimed that allow_url_fopen|include() could easily be worked around by using php: and data: URLs. Realizing that Stefan was correct, Stas felt this should be fixed forthwith. Rasmus agreed; he'd also seen the blog entry, and had even discussed a fix with Wez Furlong earlier. He posted a patch that he believed would catch the cases concerned, and asked if people could double check to make sure it offered protection against php:/data: attacks.

Nuno Lopes was furious with Stefan for blogging the problem rather than alerting the PHP development team or fixing it himself. Stefan defended himself, explaining that he had in fact raised the topic to the core team several months earlier. Nuno, who hadn't been aware of this, immediately apologized for his outburst.

Peter Brodersen wondered whether Rasmus' patch would also prevent requests to a SMB server, e.g. \\10.20.30.40\evil\malicious_php_code.txt? It appeared to him that SMB server requests are regarded as part of the default filesystem wrapper. Nuno noted that this was a Windows only issue, but one that should probably be addressed. Stas wasn't certain whether it could be restricted from the OS side; Ilia felt it would be wrong to consider a networked filesystem as non-local. He pointed out that there's no way to identify them reliably, and if this particular 'perfectly valid usage' were to be disallowed by default a large number of applications could break. Wez disagreed; he thought a random host specified in this way should be treated as suspicious, and had no problem with disallowing includes for Windows paths beginning with a double backslash when allow_url_include is disabled. Ilia wondered what Wez' definition of 'a random host' was here. Peter explained; it would cover any SMB server requested via PHP, e.g. \\smbserver\file.txt, rather than through a device mount in the operating system, e.g. Z:\file.txt.

In fact, Peter had obviously given the issue some thought. He wrote that, although it isn't possible to distinguish between requests to a local SMB server and a non-local server, a file request via one network protocol really shouldn't be any different from a similar request via any another protocol. The task was the same, after all. Peter felt the key lay in mapping allowed SMB servers as local devices through the operating system. Requesting Z:\file.txt would then be perfectly fine, and the responsibility of performing the network operation would belong entirely to the operating system, based on central server administration, rather than to PHP. If you actually needed to fetch files through arbitrary external hosts using PHP, switching on allow_url_include would still be an option.

Ilia argued that there was no way to recognize a SMB device. He was also unhappy about the idea of breaking valid applications that perform operations on networked filesystems. Rasmus explained that the idea was simply to mark SMB servers as is_url - it had nothing to do with performing normal operations on a networked filesystem. 'How many real apps rely on being able to execute code via a SMB include?' he asked, pointing out that Ilia's argument could be made for a localhost HTTP or FTP include, which is also disallowed. If someone can map a remote machine to their local drive, they have effectively configured their valid hosts. After all, 'if a bad guy can mount remote filesystems onto your server... you have bigger problems'. Ilia replied that many real applications will happily install on a SMB share. He'd often seen it done in a Windows environment, and even under Linux for backup purposes, with PHP creating the backup and writing it to the storage machine via SMB. The downside of the offered solution was that not all users are able to mount the SMB system, either through permission restrictions or through lack of know-how. While the latter could be resolved through documentation, the permissions issue would be a bigger problem. Further, there are good reasons not to allow localhost access for HTTP - it opens the Web server to a DOS attack via request loop.

Stas wrote that, if PHP was going to offer a security policy that disallows non-local code, it didn't make much sense to do the disallowing under HTTP only. Rasmus was of like mind; he felt that the policy should be to 'disallow anything that in any way looks like it could be a remote include, even if under the covers it isn't'. Ilia argued that valid usage of require()/include() via a URL is quite unusual across HTTP; the same, in his experience, is not true of SMB. That said, he considered Stas' point about security a valid one; but at the same time, there needed to be consideration of the impact on existing PHP applications of marking smb:// addresses as actual URLs. Every remote code execution hack he'd seen had been HTTP based, because HTTP provides a high degree of anonymity. A SMB hack would require an open SMB share, which was trickier; usually this would translate into an exploited Windows machine that accepts incoming SMB connections.

Stas wondered just how many applications actually need to import includes from foreign systems? It sounded unsafe to him. His assessment was that people don't generally do it on purpose - but he was willing to be educated on that point, if anyone knew better. He also pointed out that SMB can be just as anonymous as HTTP; HTTP is used more by hackers simply because HTTP hosting is more commonly available.

Richard Quadling, as a developer working solely with a Windows network, intervened to give his view. He admitted to regularly using includes via a double backslash rather than a mapped drive; having a restriction on \\ would be a problem for him. However, it was simple to work around, and would make PHP more secure, and on that basis he would be happy with the restriction. He noted that it might pose much more of a problem for shared hosts offering Windows.

Tom Sommer thought of it as a network issue. If the administrator hadn't blocked access to remote SMB servers on the network, s/he was simply asking for trouble. Tom had a similar view when it came to code that includes URLs. Including from network mounts, though, might have valid uses. Stas agreed; you wouldn't need to do anything for the Windows client to allow \\IP\share\file, providing that the box allowed anonymous SMB and there was TCP/IP access to it. On the other hand, setting the share as a mapped drive requires some effort on the part of the client. He therefore felt the line should be drawn at 'letter OK, \\IP not OK' - something, he added happily, that is also easy to do.

Ilia conceded defeat and agreed to add the restriction, since he appeared to be the only one arguing against it.

Short version: \\IP is not OK any more.

CVS: Starting out with 5.2.1

Changes in CVS that you should probably be aware of include:

  • A single last fix in the Zend Engine before PHP 5.2.0 was rolled - bug #39304 (Segmentation fault with list unpacking of string offset) [Dmitry Stogov]
  • The missing basic type handling in json_decode() was backported to PHP_5_2 branch, closing bug report #38680 [Ilia]
  • Core bug #39215 (Inappropriate close of stdin/stdout/stderr) was fixed [Ilia]
  • ext/curl can now be built against libcurl 7.16.0 in all current branches of PHP, closing bug #39354 [Ilia]
  • ext/zip gained a new userspace method (addEmptyDir()) and three new internal methods (zip_stat_init(), zip_error_clear() and zip_file_error_clear()) [Pierre]
  • In ext/dba, bug #38698 (for some keys cdbmake creates corrupted db and cdb can't read valid db) was fixed [Marcus]
  • In ext/mbstring, bug #39364 (Removed warning on empty haystack inside mb_strstr()) was fixed [Ilia]
  • In ext/filter, bug #39358 (INSTALL_HEADERS contains incorrect reference to php_filter.h) was fixed [Ilia]
  • In ext/gd, bugs #39273 (imagecopyresized() may ignore alpha channel) and #39366 (imagerotate() does not use alpha with angles > 45) were fixed [Pierre]
  • Filter support for $_SERVER in the CGI and Apache2 SAPIs was backported to PHP_5_2 branch [Ilia]
  • An optional fourth parameter, n_retries, was added to imap_open() and imap_reopen() in 5_2 and HEAD, fixing bug #39362 [Ilia]
  • Internals folk will be happy to know that the hash_apply functions are more consistent now, closing bug #39320 [Marcus]
  • SPL bugs #39313 (spl_autoload triggers fatal error) and #39151 (Parse error in recursiveiteratoriterator.php) were fixed [Marcus]

Frank Kromann, as maintainer of ext/ming, intervened when he spotted Marcus making a configuration change in his extension. He explained that missing header checks should be added to ming.h in the next libming release, rather than to the PHP extension's configure script; in fact, Marcus' approach had broken the Windows build.

Over in CVS HEAD, Pierre committed his initial Unicode support for ext/zip. He noted that entry names will be converted to ASCII, including filenames and paths used as entry names. Path and filenames are otherwise encoded using php_stream_path_param_encode(). He wasn't certain about his stream implementations, nor what the default format should be there, and asked for comments and suggestions.

Nuno started work on converting ext/tidy to Unicode awareness. His commit message implied that he'd added a converter pointer for each node, allowing text to be converted on request. Andrei Zmievski queried whether a separate converter per node was actually necessary. Was it possible for the nodes to be different? Nuno conceded that it wasn't, and explained that actually in his implementation the child nodes simply point to the converter associated with the current HTML string. He was storing a pointer to that converter, alongside a reference counter, so that the child nodes could be accessed directly. That said, he still needed time to think over his approach.

Not to be outdone, Marcus also began thinking about upgrading his corner of PHP this week. The 'low-hanging fruit' in SPL is now marked as Unicode ready.

Short version: Will Andrei get his way and manage a preview release of PHP 6 before Christmas? All bets are on.

PAT: Scalar type hinting

Hannes, as it turned out, was perfectly willing to help me trawl through README.UPDATE_5_2 and to clarify some of the items he'd added to the file. There still wasn't enough time to check every single prototype listed in there prior to release, and a few of those prototypes - including an entire section in ext/date - were corrected by various core developers after Ilia committed our efforts.

Pierre noticed that a couple of lines in zend_hash.c had been mistakenly removed from CVS HEAD, and provided a patch to restore them, which was immediately applied by Johannes Schlüter.

Rui Hirokawa applied a massive ext/mbstring patch bringing Japanese legacy encoding support to the PHP_5_2 branch, and attributed it to someone named Moriyama. He then applied a fix for illegal encoding detection under mbstring.encoding_translation, this time to both the PHP_4_4 and PHP_5_2 branches, and attributed to someone else as yet unknown to the core team, Komura.

One Nico Sabbi provided an ext/dom patch to prevent errors in XML data triggering an E_WARNING in dom_document_parser(), regardless of the set error_reporting value, when the recover property is set. He felt strongly that PHP should not take initiatives of its own accord, and asked that his change be considered for inclusion in CVS HEAD. The patch and the premise are both incorrect; DOMDocument->recover is only used to toggle that E_WARNING message, and is 0 by default.

Ilia applied a patch to both PHP_5_2 branch and CVS HEAD to fix ext/session bug #39265 (Fixed path handling inside mod_files.sh) and credited it to Michal Taborsky, who also reported the issue.

Hannes mischievously posted a patch providing support for scalar type hinting in PHP_5_2 branch, claiming that he simply wanted to archive the code somewhere he could find it later. Strangely, he also provided a link to a tarballed copy of this patch in his online archive. The tarball also contains all necessary changes to the existing test suite and a batch of new .phpt scripts to go with his code.

Naturally, Hannes' patch provoked some discussion. Pierre didn't want to allow scalar type hinting unless it went in as part of a "strongly typed" mode; but in any case, he disliked the idea of PHP raising an error should he pass string("1") instead of int(1). Marcus didn't want to support scalar type hinting unless automatic type conversion was also fully supported - in which case, why use it at all? He added that this was the reason the team had declined similar proposals repeatedly in the past. 'Exactly', wrote Pierre. 'Exactly++', wrote Zeev Suraski. He also felt it would make no sense to have two different type-hinting semantics, depending on whether you were ensuring the correct type or converting to the correct type. Ron Korving thought scalar type hinting would be more useful if it attempted auto-conversion. Brian Moon liked the concept, but conceded that he saw problems with it - mainly from request data, which is, as Brian wrote, 'all strings'. The only way he could see it working would be if the type hints converted the data and tested for a change. "1" and 1 would be considered the same, but non-numeric strings converting to 0 ought to throw an error. Brian added as an aside that currently the lines:

function test (scalar $var) {
    echo
$var;
}

result in:

Fatal error: Argument 1 passed to test() must be an object of class scalar

- something he found 'funny'. (In which sense, he didn't say.)

Ilia alone appeared to like the concept of scalar type hinting, but even Ilia wrote that he didn't want to see it in the PHP_5_2 branch.

With that out of the way, patch king Matt Wilmas turned up with an optimization for zend_hash_copy()/zend_hash_merge() - it seemed to him that zend_hash_quick_*() functions could be used for associative entries, saving the key from being hashed twice over. Dmitry investigated, and later applied Matt's latest changes in PHP_5_2 branch and CVS HEAD.

Short version: Hannes had an Evil Moment, but the Empire fought back.