Zend Weekly Summaries Issue #313

      Comments Off on Zend Weekly Summaries Issue #313

BUG: RTLD_DEEPBIND [continued]
TLK: More strict than Liskov
TLK: spl_object_hash docs
TLK: Date Wars revival attempt
BUG: 301 redirects
TLK: allow_url_include and php:/data:
CVS: Starting out with 5.2.1
PAT: Scalar type hinting

BUG: RTLD_DEEPBIND [continued]

RedHat contributor Joe Orton, who was responsible for introducing
RTLD_DEEPBIND support into the PHP core, picked up on last week’s exchange and immediately
asked Brian France the obvious question: ‘What problems do you see?

Brian’s reply was less than comforting: ‘Where to start?‘ It seems that
the Yahoo! team link their PHP extensions against a homegrown static library,
apstubs. This library defines all exported Apache elements as ‘weak’. If a
PHP extension written for Apache were run under Apache, the ‘strong’ symbols from
the server library would override the ‘weak’ ones in apstubs, but those weak
symbols would be utilized if the extension ran under the CLI SAPI.
RTLD_DEEPBIND forces PHP to use the weak apstubs symbols even
when the strong Apache symbols are present and, while Brian perfectly understood the
reasoning behind this in PHP code generally, this broke the behaviour the Yahoo!
crew rely upon.

A second, more generic issue the team had found was that, when
RTLD_DEEPBIND is defined, the rtld couldn’t find the
symbols from a library that had already been loaded. Brian was having trouble
getting to the root of this issue, but guessed that libraries were not being
unloaded between the first and second pass of Apache startup. Turning off
RTLD_DEEPBIND prevented the ensuing core dump, but not the unloading
issue itself. There was a related bug report open against RedHat, and Brian was
about to test whether removing RTLD_DEEPBIND resolved that too. He
added that there would be either a new or updated RH bug later, should the PHP or
FreeBSD kernel teams at Y! find anything solid.

Short version: The first doesn’t affect PHP generally; the second, might.

TLK: More strict than Liskov

Christian Schneider was bemused. Some weeks ago, he supplied an OO strictness
patch to prevent PHP complaining about added default parameter values or changes to
static methods that do not break the
Liskov substitution principle
. He had had no feedback over this, at all. Why,
Christian wanted to know, had his patch not been considered?

Marcus Börger explained that the Liskov principle applies to static methods when
calls via objects are common, which is the case with PHP. As with many other
languages, PHP’s static methods are inherited in the same way as any other method.
Adherence to Liskov rules would mean that PHP users could add default parameter
values, add new parameters with default values or change type hints in derived class
methods. C++ had already proved that allowing users to alter default parameter
values was not a good idea in a compiled language; it resulted in unexpected default
values. Although it had a better chance of working as advertised in PHP, allowing it
would confuse programmers coming from C++, and could prevent optimizations in the
future if PHP ever adopted the C++ idea of compile time function invocation binding.
The same was true when it came to allowing additional parameters; although it could
work in PHP, it would be more confusing than helpful. Finally, the ability to change
type hints in derived class methods had been decided against at PDM; none of the
core team knew of a language that supports it, and its rules are non-intuitive.

Marcus moved his focus to E_DEPRECATED; it still wasn’t known
whether this would be introduced in PHP 5.3.0 or PHP 6.0.0. The team would
appreciate help to make it come about quickly.

Christian wrote simply that he hadn’t realized how big an impact C++ has on PHP
design, and was very glad they were only discussing E_STRICT! He wanted
to know what kind of help the team needed, and offered to put some time into making
E_DEPRECATED possible.

Marcus explained that language designers often copy from each other; PHP’s own
object model was a combination of influences from C++, Java, Delphi, a little C#,
maybe some python, and some stuff he couldn’t even specify a counterpart for. He’d
only mentioned C++ on this occasion because the specific behaviour under discussion
could be best illustrated that way, and in his experience most people prefer
explanations by example to tracts of theory. However, if anyone happened to prefer
theory, he recommended two books: Palsberg and Schwartzbach’s Object-Oriented Type Systems (‘easy‘), and Abadi and
Cardelli’s A Theory of Object (‘a pretty mathematical approach‘).
Returning to the issue of E_DEPRECATED, Marcus felt that a list of the
existing E_STRICT messages would be very useful, and looked forward to
seeing it. Christian posted a shell script to produce such a list, along with a
small prayer that the core team wouldn’t fall into the trap of not seeing the great
potential in being different from other languages.

PHP user Jeff Moore thanked Marcus for his ‘enlightening explanation‘,
commenting that basically what it came down to was that PHP’s E_STRICT
rules are intentionally more strict than Liskov. Marcus agreed that this was
the case, and referred back to the PDM decision over type hints and non-intuitive
behaviour.

Short version: More than you ever wanted to know about OO theory and
the politics of language design.

TLK: spl_object_hash docs

Stas Malyshev had come across the new spl_object_hash() function,
which he welcomed, but had then discovered that it wasn’t documented in the PHP
manual. Was there some reason for that?

Hannes was quick to write about the ‘human resource problem‘ for the
benefit of anyone passing. Marcus, however, explained that in this case the lack of
documentation was simply an oversight. He added that the function simply generates a
unique md5 string from an object’s id and object handler address.

Short version: Freshly baked docs available here!

TLK: Date Wars revival attempt

Mauro N. Infantino wrote to internals@ with a problem. In his team’s PHP 5 code
base, there happened to be a class named DateTime. Assuming they were
not alone in this, Mauro wondered if there was any chance an INI directive might be
added into PHP that would offer the option of disabling the new class definition. He
went on to say that they’d had a lot of PHP 4 projects, all of which had migrated to
PHP 5.1 with surprising ease, but changing every date creation – and the type hints
– would be a nightmare. ‘How could it be more difficult to upgrade from PHP 5.1
to PHP 5.2?

Rasmus Lerdorf explained briefly that far more users had a class named
Date, and the PHP internal class name had already been changed to
DateTime because of this. Mauro would simply have to change his own
class. Ilia Alshanetsky, more sympathetically, introduced Mauro to the concept of
prefixing class names, and explained that PHP was likely to introduce more internal
classes into its core in the foreseeable future. That said, if there was no option
but to disable it, Mauro could edit the PHP source and rename the native class to
something like DateTime2.

Christian wrote that he was still wary of the prefix approach; it puts the burden
onto PHP users, who write more code than is in the more controllable core and
libraries of PHP. He felt strongly that core classes should be namespaced, full
stop. Ilia, who believes equally strongly that prefixing is an elementary rule in OO
programming, dismissed this argument. The language, he wrote, should always
have the best possible names, simplifying development for the majority of its users.
However popular an application might be, PHP is more popular; the needs of the many
outweigh the needs of the few. Christian partly agreed with him, but pointed out
that ‘the many’ now need to care about naming; this extra rule, in his opinion,
makes the language a little less simple to use. Marcus intervened to point out that,
while Date, DateTime or Time all are common
names, prefixing core class names with Php was just as likely to be
problematic. Should the PHP dev team have given the core class a ridiculous name to
avoid the problem? Wasn’t it easier for users not to have to check the manual every
time they wanted to use a core class? Christian was willing to bet that ‘the
number of PhpDate classes out there is considerably smaller than the
number of Date classes
‘. He believed that having a standard prefix
in the core could even allow the team to add an E_STRICT warning when
userspace classes with the same prefix were used. His favourite solution would be to
have namespaces handling the whole situation, rather than educate users to prefix
their own classes. PHP user Kevin Waterson pointed out that, since PHP
doesn’t support namespaces, the users should be prefixing. Ron Korving
argued that it wasn’t possible to rely on application developers to do this. He
agreed with Christian’s view that ‘prefixing application classes is a big burden
compared to prefixing PHP core classes
‘. Furthermore, with the rising popularity
of the ActiveRecord design pattern, database table names are starting
to dictate class names. PHP needs to change, not the user approach; PHP needs
namespace support. Lukas Smith pointed out that prefixing is trivial; ‘all people
need to do is stick a single underscore into their class name. That’s all.
‘ As
for the ActiveRecord issue, he didn’t see a problem with applying a
default prefix. Finally, he’d yet to hear any of the PHP core developers say they
were against namespace support…

Ilia pointed out that the discussion over what to name the core datetime class
had been held in the public domain, and the options on offer had been based on the
team’s analysis of the class names currently used in application code. A Google
search on DateTime had turned up only about eight PHP applications;
Ilia felt this confirmed the soundness of the choice. Besides, users don’t
have to upgrade immediately, and shared hosts are very rarely up to date. PHP
user Lester Caine argued that anyone using a shared host would need to fix their
code regardless, because a host can update at whim. Those using third party
applications wouldn’t even know they had a problem until the host upgraded, never
mind how to fix it. He felt there should be some mechanism in place to override core
class names, simply because of this. Marcus pointed out that exactly the same issues
had existed with core function names for years, and wondered why it’s suddenly such
a big problem when it comes to class names? There is a clear set of rules for core
class naming, and Marcus couldn’t see how they offered the user any less protection
than the pre-existing rules for core function naming.

Ilia and Christian meanwhile argued bitterly between themselves over the rights
of the language developer versus the rights of the application developer, with Ilia
insisting that ‘the language always has the best pick of namespaces… ANY
language
‘. He was still angry over the compromise solution that allowed PEAR to
retain the Date class, and hoped this ‘horrible mistake‘ would
not be repeated in future. Naturally this brought Pierre-Alain Joye to the defense
of the ‘horrible mistake’; the problem wasn’t the name so much as the introduction
of the class into the global namespace, without previous warning, in the final days
leading up to a minor release. Things are getting better, in that some developers
now need to justify the names they use in their extensions; but it would be better
still, wrote Pierre, if this rule applied to all.

Lester observed that all his applications had failed with the core
DateTime class because his own are named DateTime too.
Ironically, he’d chosen that name because Date was the obvious name for
the core class. He still didn’t see why it had to be hard coded into the core; the
class should just be loadable, ‘like any other good extensible language‘. In
fact, in Lester’s view, any time there’d been a BC break the ensuing problems could
have been avoided by providing a simple switch to enable the new code only if it was
needed.

Short version: Read the userland
naming guide
in the PHP manual to avoid the possibility of future
conflicts.

BUG: 301 redirects

One Ian Evans was having problems redirecting pages, and wrote to internals@ to
check whether this was a known issue before submitting it as a bug report. He was
running PHP 5.1.4 as FastCGI under lighttpd, and had found that his HTTP 301
redirects were returning HTTP 302 instead. His code read:


header("Status: 301 Moved
Permanently"
);
header("Location:
mynewurl"
);
exit();

and his header checker was returning this:


#1 Server Response: oldurl
HTTP Status Code: HTTP/1.0 302 Found
Connection: close
X-Powered-By: PHP/5.1.4
Location: mynewurl

– keeping his old pages in Google rather than using the new location. Ian added
that he’d already tried a variety of permutations in the code, following suggestions
on the PHP general list, but nothing was working for him.

Various people promptly suggested a variety of permutations in the code, but Ian
insisted that all of them – including the correct version,


header("HTTP/1.1 301 Moved
Permanently"
);

return HTTP 302 under his set-up. Eventually, Edin Kadribasic suggested that this
might have more to do with the lighttpd server itself than with PHP – in fact he’d
had issues with headers under lighttpd himself. He thought Ian should contact the
development team responsible for the lighttpd server code.

Short version: It’s not always PHP’s fault.

TLK: allow_url_include and php:/data:

Stas had come across a blog
entry from Stefan Esser
in which the security expert had
claimed that allow_url_fopen|include() could easily be worked around by
using php: and data: URLs. Realizing that Stefan was
correct, Stas felt this should be fixed forthwith. Rasmus agreed; he’d also seen the
blog entry, and had even discussed a fix with Wez Furlong earlier. He posted a patch that he
believed would catch the cases concerned, and asked if people could double check to
make sure it offered protection against php:/data:
attacks.

Nuno Lopes was furious with Stefan for blogging the problem rather than alerting
the PHP development team or fixing it himself. Stefan defended himself, explaining
that he had in fact raised the topic to the core team several months earlier. Nuno,
who hadn’t been aware of this, immediately apologized for his outburst.

Peter Brodersen wondered whether Rasmus’ patch would also prevent requests to a
SMB server, e.g. \\10.20.30.40\evil\malicious_php_code.txt? It appeared
to him that SMB server requests are regarded as part of the default filesystem
wrapper. Nuno noted that this was a Windows only issue, but one that should probably
be addressed. Stas wasn’t certain whether it could be restricted from the OS side;
Ilia felt it would be wrong to consider a networked filesystem as non-local. He
pointed out that there’s no way to identify them reliably, and if this particular
perfectly valid usage‘ were to be disallowed by default a large number of
applications could break. Wez disagreed; he thought a random host specified in this
way should be treated as suspicious, and had no problem with disallowing
includes for Windows paths beginning with a double backslash when
allow_url_include is disabled. Ilia wondered what Wez’ definition of ‘a
random host’ was here. Peter explained; it would cover any SMB server requested via
PHP, e.g. \\smbserver\file.txt, rather than through a device mount in
the operating system, e.g. Z:\file.txt.

In fact, Peter had obviously given the issue some thought. He wrote that,
although it isn’t possible to distinguish between requests to a local SMB server and
a non-local server, a file request via one network protocol really shouldn’t be any
different from a similar request via any another protocol. The task was the same,
after all. Peter felt the key lay in mapping allowed SMB servers as local devices
through the operating system. Requesting Z:\file.txt would then be
perfectly fine, and the responsibility of performing the network operation would
belong entirely to the operating system, based on central server administration,
rather than to PHP. If you actually needed to fetch files through arbitrary external
hosts using PHP, switching on allow_url_include would still be an
option.

Ilia argued that there was no way to recognize a SMB device. He was also unhappy
about the idea of breaking valid applications that perform operations on networked
filesystems. Rasmus explained that the idea was simply to mark SMB servers as
is_url – it had nothing to do with performing normal operations on a
networked filesystem. ‘How many real apps rely on being able to execute code via
a SMB include?
‘ he asked, pointing out that Ilia’s argument could be made for a
localhost HTTP or FTP include, which is also disallowed. If someone can map a remote
machine to their local drive, they have effectively configured their valid hosts.
After all, ‘if a bad guy can mount remote filesystems onto your server… you
have bigger problems
‘. Ilia replied that many real applications will happily
install on a SMB share. He’d often seen it done in a Windows environment, and even
under Linux for backup purposes, with PHP creating the backup and writing it to the
storage machine via SMB. The downside of the offered solution was that not all users
are able to mount the SMB system, either through permission restrictions or through
lack of know-how. While the latter could be resolved through documentation, the
permissions issue would be a bigger problem. Further, there are good reasons not to
allow localhost access for HTTP – it opens the Web server to a DOS attack via
request loop.

Stas wrote that, if PHP was going to offer a security policy that disallows
non-local code, it didn’t make much sense to do the disallowing under HTTP only.
Rasmus was of like mind; he felt that the policy should be to ‘disallow anything
that in any way looks like it could be a remote include, even if under the covers it
isn’t
‘. Ilia argued that valid usage of
require()/include() via a URL is quite unusual across
HTTP; the same, in his experience, is not true of SMB. That said, he considered
Stas’ point about security a valid one; but at the same time, there needed to be
consideration of the impact on existing PHP applications of marking
smb:// addresses as actual URLs. Every remote code execution hack he’d
seen had been HTTP based, because HTTP provides a high degree of anonymity. A SMB
hack would require an open SMB share, which was trickier; usually this would
translate into an exploited Windows machine that accepts incoming SMB
connections.

Stas wondered just how many applications actually need to import includes
from foreign systems? It sounded unsafe to him. His assessment was that people don’t
generally do it on purpose – but he was willing to be educated on that point, if
anyone knew better. He also pointed out that SMB can be just as anonymous as HTTP;
HTTP is used more by hackers simply because HTTP hosting is more commonly
available.

Richard Quadling, as a developer working solely with a Windows network,
intervened to give his view. He admitted to regularly using includes via a double
backslash rather than a mapped drive; having a restriction on \\ would
be a problem for him. However, it was simple to work around, and would make PHP more
secure, and on that basis he would be happy with the restriction. He noted that it
might pose much more of a problem for shared hosts offering Windows.

Tom Sommer thought of it as a network issue. If the administrator hadn’t blocked
access to remote SMB servers on the network, s/he was simply asking for trouble. Tom
had a similar view when it came to code that includes URLs. Including
from network mounts, though, might have valid uses. Stas agreed; you wouldn’t need
to do anything for the Windows client to allow \\IP\share\file,
providing that the box allowed anonymous SMB and there was TCP/IP access to it. On
the other hand, setting the share as a mapped drive requires some effort on the part
of the client. He therefore felt the line should be drawn at ‘letter OK, \\IP not OK
– something, he added happily, that is also easy to do.

Ilia conceded defeat and agreed to add the restriction, since he appeared to be
the only one arguing against it.

Short version: \\IP is not OK any more.

CVS: Starting out with 5.2.1

Changes in CVS that you should probably be aware of include:

  • A single last fix in the Zend Engine before PHP 5.2.0 was rolled – bug #39304 (Segmentation fault
    with list unpacking of string offset) [Dmitry Stogov]
  • The missing basic type handling in json_decode() was backported
    to PHP_5_2 branch, closing bug report #38680 [Ilia]
  • Core bug #39215
    (Inappropriate close of
    stdin/stdout/stderr) was fixed [Ilia]
  • ext/curl can now be built against libcurl 7.16.0 in all current
    branches of PHP, closing bug
    #39354
    [Ilia]
  • ext/zip gained a new userspace method (addEmptyDir()) and
    three new internal methods (zip_stat_init(),
    zip_error_clear() and zip_file_error_clear())
    [Pierre]
  • In ext/dba, bug
    #38698
    (for some keys cdbmake creates corrupted db and
    cdb can’t read valid db) was fixed [Marcus]
  • In ext/mbstring, bug #39364 (Removed warning on empty haystack inside
    mb_strstr()) was fixed [Ilia]
  • In ext/filter, bug
    #39358
    (INSTALL_HEADERS contains incorrect reference to
    php_filter.h) was fixed [Ilia]
  • In ext/gd, bugs
    #39273
    (imagecopyresized() may ignore alpha channel) and #39366 (imagerotate()
    does not use alpha with angles > 45) were fixed [Pierre]
  • Filter support for $_SERVER in the CGI and Apache2 SAPIs was
    backported to PHP_5_2 branch [Ilia]
  • An optional fourth parameter, n_retries, was added to
    imap_open() and imap_reopen() in 5_2 and HEAD, fixing
    bug #39362 [Ilia]
  • Internals folk will be happy to know that the hash_apply
    functions are more consistent now, closing bug #39320 [Marcus]
  • SPL bugs #39313
    (spl_autoload triggers fatal error) and #39151 (Parse error in
    recursiveiteratoriterator.php) were fixed [Marcus]

Frank Kromann, as maintainer of ext/ming, intervened when he spotted
Marcus making a configuration change in his extension. He explained that missing
header checks should be added to ming.h in the next libming
release, rather than to the PHP extension’s configure script; in fact, Marcus’
approach had broken the Windows build.

Over in CVS HEAD, Pierre committed his initial Unicode support for
ext/zip. He noted that entry names will be converted to ASCII, including
filenames and paths used as entry names. Path and filenames are otherwise encoded
using php_stream_path_param_encode(). He wasn’t certain about his
stream implementations, nor what the default format should be there, and asked for
comments and suggestions.

Nuno started work on converting ext/tidy to Unicode awareness. His commit
message implied that he’d added a converter pointer for each node, allowing text to
be converted on request. Andrei Zmievski queried whether a separate converter per
node was actually necessary. Was it possible for the nodes to be different? Nuno
conceded that it wasn’t, and explained that actually in his implementation the child
nodes simply point to the converter associated with the current HTML string. He was
storing a pointer to that converter, alongside a reference counter, so that the
child nodes could be accessed directly. That said, he still needed time to think
over his approach.

Not to be outdone, Marcus also began thinking about upgrading his corner of PHP
this week. The ‘low-hanging fruit’ in SPL is now marked as Unicode ready.

Short version: Will Andrei get his way and manage a preview release of
PHP 6 before Christmas? All bets are on.

PAT: Scalar type hinting

Hannes, as it turned out, was perfectly willing to help me trawl through
README.UPDATE_5_2 and to clarify some of the items he’d added to the file.
There still wasn’t enough time to check every single prototype listed in there prior
to release, and a few of those prototypes – including an entire section in
ext/date – were corrected by various core developers after Ilia committed our
efforts.

Pierre noticed that a couple of lines in zend_hash.c had been mistakenly
removed from CVS HEAD, and provided a patch to restore them, which was immediately
applied by Johannes Schlüter.

Rui Hirokawa applied a massive ext/mbstring patch bringing Japanese legacy
encoding support to the PHP_5_2 branch, and attributed it to someone named Moriyama.
He then applied a fix for illegal encoding detection under
mbstring.encoding_translation, this time to both the PHP_4_4 and
PHP_5_2 branches, and attributed to someone else as yet unknown to the core team,
Komura.

One Nico Sabbi provided an ext/dom patch to prevent errors in XML data
triggering an E_WARNING in dom_document_parser(),
regardless of the set error_reporting value, when the
recover property is set. He felt strongly that PHP should not take
initiatives of its own accord, and asked that his change be considered for inclusion
in CVS HEAD. The patch and the premise are both incorrect;
DOMDocument->recover is only used to toggle that
E_WARNING message, and is 0 by default.

Ilia applied a patch to both PHP_5_2 branch and CVS HEAD to fix
ext/session bug #39265 (Fixed path handling inside mod_files.sh) and credited it
to Michal Taborsky, who also reported the issue.

Hannes mischievously posted a patch providing support for scalar type hinting in
PHP_5_2 branch, claiming that he simply wanted to archive the code somewhere he
could find it later. Strangely, he also provided a link to a
tarballed copy
of this patch in his online archive. The tarball also contains
all necessary changes to the existing test suite and a batch of new .phpt
scripts to go with his code.

Naturally, Hannes’ patch provoked some discussion. Pierre didn’t want to allow
scalar type hinting unless it went in as part of a “strongly typed” mode; but in any
case, he disliked the idea of PHP raising an error should he pass
string("1") instead of int(1). Marcus didn’t want to
support scalar type hinting unless automatic type conversion was also fully
supported – in which case, why use it at all? He added that this was the reason the
team had declined similar proposals repeatedly in the past. ‘Exactly‘, wrote
Pierre. ‘Exactly++‘, wrote Zeev Suraski. He also felt it would make no sense
to have two different type-hinting semantics, depending on whether you were ensuring
the correct type or converting to the correct type. Ron Korving thought scalar type
hinting would be more useful if it attempted auto-conversion. Brian Moon liked the
concept, but conceded that he saw problems with it – mainly from request data, which
is, as Brian wrote, ‘all strings‘. The only way he could see it working would
be if the type hints converted the data and tested for a change. "1"
and 1 would be considered the same, but non-numeric strings converting
to 0 ought to throw an error. Brian added as an aside that currently
the lines:


function
test (scalar
$var
) {
    echo
$var;
}

result in:


Fatal error: Argument 1 passed to test() must
be an object of class scalar

– something he found ‘funny‘. (In which sense, he didn’t say.)

Ilia alone appeared to like the concept of scalar type hinting, but even Ilia
wrote that he didn’t want to see it in the PHP_5_2 branch.

With that out of the way, patch king Matt Wilmas turned up with an optimization
for zend_hash_copy()/zend_hash_merge() – it seemed to him
that zend_hash_quick_*() functions could be used for associative
entries, saving the key from being hashed twice over. Dmitry investigated, and later
applied Matt’s latest changes in PHP_5_2 branch and CVS HEAD.

Short version: Hannes had an Evil Moment, but the Empire fought back.