Zend Weekly Summaries Issue #366

      Comments Off on Zend Weekly Summaries Issue #366


TLK: Safe mode [again]
TLK: Ignored patches
BUG: always_inline symbol clash
REQ: conf.d-like support
TLK: Preliminary taint support [continued]
NEW: PHP 5.2.5
TLK: Parallel database queries
CVS: T_IMPORT vs T_USE resolved, yay
PAT: __sleep

4th November – 10th November 2007

TLK: Safe mode [again]

Mark Krenz picked up the trail of his sporadic one-man
campaign
to have safe_mode reinstated in PHP 6, explaining
that he hadn’t had time to deal with the discussion back in the day. In his
search for 100% security in a shared environment, Mark was now preparing to
look into mpm_itk as a means to control user access. However,
the best solution he’d found for securing PHP to date was still to run PHP in
safe mode and rely on safe_mode_exec_dir to prevent users from
running arbitrary executables. After lengthy security analysis of the options
available to him, he still hadn’t found anything to replace it. If the PHP
development team could just provide a replacement for
safe_mode_exec_dir, Mark would be much happier.
open_basedir wasn’t a solution; a user could still upload
cat and run something like:


print
exec('/home/myuser/www/cat
/home/otheruser/private/mysqlinfo'
);


Mark realized he’d probably be advised to disable all the exec
functions, ‘but tell that to joe user who wants to be able to run some
popular photo gallery software or blog that needs to run an external command
like ImageMagick’
. He also recognized that an execution directory
restriction would be vulnerable to arbitrary arguments passed to the
restricted programs, but still saw this approach as more safe than any of the
options available. The usual advice, to run Apache in a chroot
jail, struck Mark as ‘unreasonable‘, since it would entail running
200+ instances of Apache on a single server; it also wasn’t trivial to set
up.

Cristian Rodriguez immediately argued that the kind of directory restriction
Mark wanted belongs at the level of the operating system. Removing
safe_mode simply eliminates a false sense of security and makes
it clearer that people should secure their servers better; ‘whoever
convinced you that it is a good thing does not have a clue
.’ Mark replied
tiredly that he felt like he was ‘up against a religion‘. He’d done his
own security analysis and knew what he was talking about, but the term “safe
mode” seemed to short-circuit a lot of peoples’ brains. He asked rhetorically
what could sanely be done at OS level to prevent random program execution by
Apache and its modules? All the suggestions he’d heard so far seemed to come
from those who’d never run a shared user environment; either they simply
wouldn’t work, or they’d be cost prohibitive.

At this stage Mark altered his plea slightly and made a more reasonable
request for ‘a transition period in which sane solutions are presented to
the community
‘. Simply dropping safe_mode and telling
everyone to deal with it was irresponsible; tools for dealing with it should
be made available, and those tools should include such things as a setting to
prevent execution outside a named directory.

Alexey Zakhlestin recommended a combination of FastCGI and
suexec, which would give every user their own instance of PHP
with the uid as owner, and cited Textdrive/Joyent as a hosting provider
happily using this approach. Mark promptly produced a
document
suggesting they had poor security, and argued that for
him at least, relying on security by obscurity would mean sleepless
nights. Alexey queried this; had Mark simply ignored the FastCGI part? Mark
replied in the negative; he simply didn’t feel that fastcgi/suexec/mod_suphp
were able to handle everything. Besides, wasn’t the whole point of PHP
originally supposed to be that it was part of Apache? (It later became clear
that Mark hadn’t actually tried the FastCGI approach in recent years.)

Michael McGlothlin wondered why Mark didn’t just give every user their own
virtual machine. Mark explained that he offers tiered hosting; some of his
users pay for precisely that, but mega-cheap hosting is only financially
viable if there are a couple of hundred users per machine. Michael ran
through a few possibilities before conceding that any good solution would be
more resource intensive than safe_mode. The best he could offer
for ‘people who only want to pay $5/mo‘ was to keep them on PHP 5
forever.

Nate Gordon provided some backing for Mark’s arguments; he pointed out that
it’s not always possible to run a script as a single user in a shared
environment, because the content may be owned by a group of users. He didn’t
see how it could be difficult to lock down execution to a specific directory
on a per-vhost basis via PHP, given that PHP provides the means for
execution. Nate added that he would be the first to acknowledge that the
basic premise of safe_mode is broken. What he really needed was
the ability to disable execution of anything other than PHP on a per-vhost
basis. Stefan Esser’s suhosin project provides a
per-vhost function disabling feature, but Nate really didn’t understand why
it should be left to an extension to provide that. He’d also like a per-vhost
exec_dir limit… ‘People are too quick to throw out the baby
with the bath water on safe_mode. It isn’t completely useless to
everyone
.’

Peter Brodersen noted that the idea of unbundling
safe_mode_exec_dir from safe_mode had come up before now and
shelved for “later”; perhaps now was later. Basically, the need was
for a central switch for exec functions, rather than a long and
changeable list under disable_functions.

Mark couldn’t have agreed more. His biggest concern was that Linux distros
would start bundling PHP 6 before it had that feature. He therefore saw this
as urgent, and wrote that he’d be willing to write documentation or a
migration guide for php.net, if someone could only provide the C skills to
get safe_mode_exec_dir – or some equivalent – into CVS HEAD very
soon.

Short version: Retaining safe_mode_exec_dir has been mooted several
times in the past and never rejected (sorry Tony).

TLK: Ignored patches

Greg Beaver had discovered for himself what it feels like to be on the blunt
end of the patch review process, and he didn’t like it much. Two of his
recent patches – one to implement multiple namespaces
per file (sans brackets)
, and one to remove keyword
restrictions for methods
– had seemingly fallen through the cracks. Greg
wrote rather bitterly that he’d like a review and feedback or even a commit,
so we can still pretend that outsider contributions have an impact on
PHP, even those from annoying people like me.

Stas Malyshev wrote that one of those patches (removing keyword restrictions
for methods) should probably be applied, but he wasn’t sure which of Greg’s
many patches it was; there had been two along those lines, as he recalled. As
for multiple namespace support, it brought too many complications to both the
syntax and the Zend Engine, and Stas really wasn’t convinced the end was
worth the means.

Short version: Not ignored so much as ‘on hold’.

BUG: always_inline symbol clash

Wez Furlong took time out to report a build problem in the PHP_5_3 branch on
his Mac OSX. It seems that the system headers on the platform use
__attribute__((always_inline)), and zend.h now defines
always_inline to ‘something else‘, causing problems when
the compiler tries to resolve that attribute name. Wez suggested prefixing the defines used
in the Zend Engine with zend_ or a similar namespacing token. In
fact, he’d assumed this was standard practice. Any similar updates should
also be fixed in the same way.

Dmitry Stogov wondered if it might not be better just to define
always_inline as inline on Mac OSX, but Wez
explained that this wasn’t a platform specific issue. Symbol leakage has the
potential to break any library using that feature – he’d just happened to
notice it on OSX. It would be best to rename the symbols to avoid
conflict. Dmitry asked for more information about the compiler and the error,
and whether there were existing reports about the issue. Wez patiently demonstrated the problem
with a faked system definition, and gave his compiler information (GCC 4.0.1)
as requested. Dmitry okayed Wez’ original patch at this stage, and asked him
to commit it – but Wez was all out of time, and that rough demo patch only
covered one small area in any case.

Short version: Symbol leakage needs attention.

REQ: conf.d-like support

A Sriram Natarajan wrote to the internals list with a request for
‘Include’ file/directory support (like ‘conf.d’ in Apache httpd)‘.
His idea was that loaded extensions could be defined in a separate file,
rather than in a single php.ini file. Sriram believed that some Linux
distributions already do this, but wanted to know whether the facility could
be considered for the standard PHP distribution.

Cristian Rodriguez introduced Sriram to the
--with-config-file-scan-dir configuration option utilized by
those Linux distributions; ‘no other black magic involved‘.

Short version: Sometimes things are less complicated than they
seem.

TLK: Preliminary taint support [continued]

Cristian was, however, still finding it a complicated business to build PHP
with taint support. He’d tried Wietse Venema’s most
recent tarball
, which included code to update the apache2 SAPI,
but it still wasn’t compiling for him. The compiler complained about a
casting issue somewhere in the CGI SAPI and then bailed out.

Wietse tried it himself, but couldn’t reproduce the problem.

Christian Schneider didn’t have any build problems either, and wrote to
express his happiness with the patch. He posted a small patch of his own
adding taint support to func_get_arg[s](), and suggested that
the taint functions should probably be namespaced with a taint_
prefix before being integrated into the PHP core. That said, he saw taint
mode as such a useful tool that he planned to patch PHP on his team’s
development boxes, and promised Wietse more feedback in the near future if he
was authorized to go ahead with this.

Wietse thanked Christian for his patch and explained that he intended to
revise the user interface after catching up with PHP 5.2.5, since that took
priority.

Short version: Surely that should read ‘PHP_5_3’?

NEW: PHP 5.2.5

Ilia Alshanetsky, as Release Master for the PHP 5.2 series, announced the
release of PHP 5.2.5 as follows:

Gaetano Giunta went immediately to download the new release and undertake a
full analysis of the versioning information. He reported that, of the 83
extensions shipped with PHP, 7 had had changes in their source since the PHP
5.4.4 release but had not updated the versioning information as reported by
phpversion(). Of these 7, only ext/tidy had an updated
version number in the global phpinfo() page. Another extension –
ext/oci8 – had updated version information but no changes in the code.
Gaetano wasn’t certain about ext/mysql; changes had been made there,
but may have been non-affective. 5 other extensions had no versioning
information whatsoever, and a whopping 46 extensions had no versioning
information available via phpversion().

Tony Dovgal explained that ext/oci8 is released through PECL, and its
release cycle isn’t synchronized with the PHP core; the same ought to apply
to far more extensions in theory than it does in practice. That said, he felt
that core extensions should only have a version update following major changes
or additions, and this would be a rare event in a bug-fix only branch.

Gaetano argued that PECL extensions should sync better with the PHP release
cycle. He’d found that pecl/oci8 had been updated to version 1.2.4 a
single day after the PHP 5.2.4 release, making the extension shipped with PHP
5.2.4 appear different to the PECL version even though the code was exactly
the same. Even 1.2.4-dev in the core would have been better than leaving it
at 1.2.3. Still, misleading version information was better than none at
all… One problem with the core and PECL versions not being in sync was that
there is movement between the two; another was that it’s perfectly possible
for a user to load a PECL extension to replace a core extension. That makes
it impossible to rely on the PHP version when checking for the presence of a
given feature or bug, which wouldn’t be the case if every fix were
accompanied by a change to the extension’s version number.

Tony pointed out that there is a $Revision$ CVS tag for that
purpose, and a version number doesn’t mean quite the same thing. Gaetano
disagreed; to him, it seemed the CVS tag was intended for those compiling
from source and reporting or fixing bugs, and not as something to be accessed
and used by the general PHP coder.

Short version: Gaetano also has a one-man campaign on the go.

TLK: Parallel database queries

Arend van Beelen was doing some research. He hoped to develop a shared
library that could perform database queries to multiple databases in
parallel, and he hoped to be able to use it from within PHP. His main concern
was thread safety. Arend could see three possible approaches:

  • Use multi-threading within the library, but have it return a blocking
    single-threaded API
  • Use a single thread and asynchronous socket communication
  • Use a daemon on the localhost as a middle man, allowing PHP to connect
    once and then handling all the database work itself before passing back a
    result

Given that the aims included stability and minimal overhead, could anyone
advise him about the pros and cons of these solutions?

One Donal McMullen pointed out that a ‘maxed out‘ CPU on a web server
won’t go away on its own, but if that wasn’t the issue, the
curl_multi_* functions can be useful. A cheap way to parallelize
database or data-object access would be to implement a services-oriented
architecture and call that library from a script using said functions. The
advantage of this approach was its quick and easy implementation; it would,
however, introduce latency into data retrieval, making it slower for most
applications.

Arend thanked Donal for his suggestion, and agreed it might provide some
quick solutions in a small scale venture. He’d omitted to explain that his
own situation involved literally hundreds of servers, and the farm is
growing. Each time the number of web servers increased, the databases became
the bottleneck. Although parallel querying wouldn’t magically resolve the
bottlenecks, the specific problem Arend was addressing was that of tables
that are divided over multiple database clusters. The aim of parallellization
techniques, in this case, would be to remove the strain of dealing with
distributed databases from the PHP application.

Lukas Smith shared some insights about existing databases. The pgsql
extension already has support for asynchronous queries in
pg_send_query() and friends. It should be possible to use MySQL
Proxy to create something that splits a single query into multiple queries
and then rejoins them. Finally, since MySQL AB are actively developing a
native PHP library, Arend might want to talk to them about his ideas.

Arend replied that, while Lukas’ suggestion of query splitting was exactly
what he hoped to achieve, MySQL Proxy didn’t appear to be the best way of
approaching it. Adding another proxy layer between the web servers and the
database servers wouldn’t only mean additional overhead; it would bring new
potential bottlenecks and points of failure. That was precisely why Arend
hoped to move the functionality onto the web servers themselves. That said,
some of MySQL Proxy’s functionality is exactly the kind of thing he needs,
and contacting MySQL AB about the possibility of re-using some of its
components might well be a good idea.

Rasmus Lerdorf recommended a simple single-threaded event-driven approach,
and suggested that Arend look into the source behind the
curl_multi() implementation, since he is essentially planning to
do the same thing. Writing a threaded library would mean dealing with a lot of
issues (portability, threading clashes, signal handling and so on), and Rasmus
didn’t see how intra-request thread scheduling would help any when it came to
busy web servers.

Short version: This sounds like a scarily big project.

CVS: T_IMPORT vs T_USE resolved, yay

Changes in CVS that you should probably be aware of include:

  • GD library bug #43121
    (gdImageFill() with IMG_COLOR_TILED crashes httpd)
    was fixed in PHP_5_2, PHP_5_3 and CVS HEAD [Mattias Bengtsson]
  • In the core, the copy() function’s optional third
    parameter context was backported to the 5_3 branch [Jani
    Taskinen]
  • Core bug #43197
    (array_intersect_assoc() does not emit warning messages for
    error inputs) was fixed in 5_3 and HEAD [Ilia]
  • Following the addition of zend_mm_set_custom_handlers() in
    the Zend API, user defined malloc(), realloc() and
    free() are supported in PHP_5_3 and CVS HEAD (affects internals
    only) [Dmitry]
  • T_IMPORT is now T_USE [Dmitry]
  • There is now a glob stream wrapper, glob://, in PHP_5_3
    branch and CVS HEAD [Marcus]
  • Core bug #43196
    (array_intersect_assoc() crashes with non-array input) was fixed
    in 5_2, 5_3 and HEAD [Jani]
  • Zend Engine bugs #43201 (Crash
    on using unitialized vals and __get()/__set()) and #43175 (__destruct()
    throwing an exception with __call() causes segfault) were fixed
    in the PHP_5_3 branch and CVS HEAD [Dmitry]
  • Streams bug #43216
    (stream_is_local() returns FALSE on
    file://) was fixed in 5_3 and HEAD [Dmitry]
  • A bunch of ext/interbase bugs were finally fixed and/or closed
    in the PHP_5_3 branch and CVS HEAD – #30690, #30907, #32143, #39056, #39397, #39700 and #42284. See history for details.
    [Lars Westermann]
  • Following the PHP 5.2.5 release, the fixes for Zend Engine bugs #43175 (__destruct()
    throwing an exception with __call() causes segfault) and #43201 (Crash on using unitialized vals
    and __get/__set) and streams bug #43216
    (stream_is_local() returns FALSE on
    file://) were merged to the 5_2 branch [Dmitry]

In other CVS news, Jani Taskinen added support for special
[PATH=/opt/httpd/www.example.com/] and
[HOST=www.example.com] INI sections; these are intended for
admins, and cannot be overridden in user-defined INI files. He also
backported support for loading modules using full paths, via the
extension directive.

Andrei Zmievski welcomed Bob Majdak, another new contributor, into the
PHP-GTK fold. Wez did something far more mysterious: a new module named
php-objc appeared overnight in the php.net CVS repository.

Short version: An Objective-C bridge appears without fanfare – and
there will definitely be a PHP 5.2.6.

PAT: __sleep

Sara Golemon helped David Zülke polish up his patch
adding a new option, ignore_errors, for the HTTP
fopen wrapper. This is now in CVS HEAD and the PHP_5_3 branch,
and offers a way to pick up HTTP response headers regardless of status.

One Andrew Minerd offered
a patch against CVS
HEAD and the PHP_5_3 branch that would allow the magic __sleep()
function to return NULLto continue the normal serialization
process
‘. This, he wrote, would allow the function to clean up without
having to resort to the Reflection API. Andrew had checked his patch against
the test suite, and included a separate patch correcting a wrong
EXPECTF in one of the existing tests.

Johannes Schlüter applied a Zend Engine patch from Andrey Hristov
bringing persistency support to zend_ptr_stacks (affects
internals only).

Stas went looking for ignored patches (this does happen from time to time)
and came across Martin Jensen’s
patch
to unify phpinfo() output for PDO drivers. He wondered
why the patch was PECL-specific when PDO is not, and moved on to Wez Furlong’s
large file
support patch
. Stas still had some concerns stemming from 64-bit
compatibility and the change to the FILE* structure, and
wondered if Wez knew how LF-enabled code copes with non-LF-enabled code.
Again, there was no response; the original patch being three weeks old, it’s
likely Wez missed the query.

And finally, Johannes committed a trivial patch from imagick developer
Mikko Koppanen, replacing #ifdef with #if defined in
request initialization/shutdown in the PHP_5_2 branch of ext/mysqli.

Short version: LFS needs serious research before it stands a
chance.