REQ: Sandboxed semaphores
REQ: ext/fileinfo [continued]
TLK: Focus on *printf [continued]
TLK: Return value of convert_to_* [continued]
NEW: PHP 5.2.1 RC2
NEW: PHP 4.4.5 RC1
TLK: FastCGI, CLI optimized for win32
TLK: The new Windows build
CVS: It's all in the HEAD
PAT: OpenSSL improvements

REQ: Sandboxed semaphores

One Wojciech Malota wrote to internals@ at the beginning of the week with a complaint, and a suggestion. He explained that it isn't currently possible to safely remove semaphores with sem_remove() if they are being used to execute concurrent processes, because there's no way to know if another process has acquired a recently released semaphore. For that reason, he can't call sem_remove() on semaphores that have been released. However, there is a limit on the number of semaphores allowed, and executing sem_get() after the 128th semaphore triggers a warning.

Wojciech led us on a journey towards his solution for this problem. He began by offering up his own pseudo version of sem_remove():

function my_sem_remove($id) {
    if(
Semaphore $id isn't acquired and its FIFO is empty) {
        sem_remove($id);
    }
}

allowing him to do something like:

$id = sem_get(ID);
sem_acquire($id);

// CRITICAL REGION

sem_release($id);
my_sem_remove($id);

This could be simplified further:

function my_sem_release($id) {
    
sem_release($id);
    
my_sem_remove($id);
}

$id = sem_get(ID);
sem_acquire($id);

// CRITICAL REGION

my_sem_release($id);

However, it would still be possible at this stage to call sem_acquire($id) on a non-existent semaphore; there needed to be a guarantee that neither sem_get() nor sem_acquire() could be interrupted:

function my_sem_acquire($key, $max_acquire = 1, $perm = 0666, $auto_release = true) {
    
$id = sem_get($key, $max_acquire, $perm, $auto_release);
    
sem_acquire();
    return
$id;
}

Safe usage would be as simple as:

$id = my_sem_acquire(SEMKEY);

// CRITICAL REGION

my_sem_release($id);

The only problem was that this couldn't be written in userland code, and Wojciech has no C skills.

Internals newbie Michael Allen - who turned out to be a bit of an expert on semaphores - responded, initially saying that it wouldn't be possible to clean them up properly and the only solution was to use the process-shared POSIX kind. A few emails later he realized he'd badly misunderstood the problem Wojciech was attempting to describe, and started over. Only the application would be in a position to know when a semaphore can safely be removed, so application-specific logic would be needed to trigger semaphore removal; 'for example, you could have a "boss" process that collects the exit status of each "worker". When the boss process has an exit status for all workers the semaphore can be removed'. Wojciech thought, in that case, the PHP interpreter itself should be a "boss", since scripts couldn't 'know' enough to take the decision. That way, if a script called my_sem_release(), the semaphore would be removed and another script calling sem_get() via my_sem_acquire() wouldn't cause an error, since it would create the semaphore once more. This solution appeared safe to him, and he regretted that his lack of knowledge about PHP source code prevented him implementing the behaviour. He concluded that the key to success would be in refusing any interruptions to instructions within the acquire and release functions.

Short version: Only an idea - or is anyone interested in implementing it?

REQ: ext/fileinfo [continued]

Andrey Hristov continued the discussion over the meaning of PECL with an observation that many of those installing PHP might believe that by running ./configure either as-is or by including all the --enable|--with options found in the configure script, they had provided everything. It is entirely possible to install PHP without ever being aware of the existence of PECL until some process requires installation of a PECL extension.

Kevin Waterson was still battling for the future of ext/fileinfo, and cared less about PECL per se. He saw file MIME type validation as 'quite a generic procedure', and asked if anyone else felt this way about it? Pierre-Alain Joye and Jochem Maas both did; Jochem wrote that, although to him it's no problem to type

pecl install fileinfo

the majority of PHP users don't have that privilege. He saw fileinfo as a way of providing encouragement to check more than just the three-letter extension of uploaded files. Derick Rethans and user Mike Robinson backed this view; Mike termed it 'very basic functionality'. Derick wrote that one of the advantages of PECL is that it allows extensions to be swapped into and out of the core as appropriate, but went on to say that no hosters install PECL modules ad hoc. He personally felt fileinfo should be not only part of the core but turned on by default - and possibly even bundled, if some effort were put into improving the extension.

Short version: The fileinfo module has fans in high places.

TLK: Focus on *printf [continued]

Andrei Zmievski wanted to know whether anyone else - and particularly Matt Wilmas - thought printf() should use locale-aware formatting by default, given that POSIX locales other than en_US are deprecated in PHP 6 Unicode mode. Matt agreed with him that it didn't seem a good idea, particularly given that that function family hadn't been locale-aware in the first place. That said, his only real issue with locale-awareness was that introducing it had created a bug in number_format(), amongst others. Pierre also agreed on that point, noting that a lot of extensions rely on the behaviour of printf() and 'we cannot change it just for "fun"'.

Short version: Goodbye to *printf locale-awareness.

TLK: Return value of convert_to_* [continued]

On another front, Andrei continued his discussion with Tony Dovgal over the behaviour of the internal conversion functions. The only way, wrote Andrei, that a failure in zend_parse_parameters() would prevent the function being executed would be when users blindly passed an object somewhere a string is expected. It wouldn't break anything 'for those objects that know how to convert themselves'.

Tony still didn't see any good reason for breaking the existing behaviour, given that there are no technical reasons to prevent it. To Andrei, though,

echo substr($obj, 0, 3);

shouldn't have resulted in

Obj

in the first place...

Marcus Börger intervened to say that there had been plenty of discussion on this issue in the past, and the consensus was that the only reason to return Object in earlier versions of PHP 5 had been that __toString() wasn't fully working back then. 'The current 5.2/6.0 behavior is what we wanted in the first place and hence correct, or do we need to restart the discussion again?' Tony wrote rather bitterly that he could name quite a number of things that seem broken (as in aesthetically wrong) to him, but he realized he couldn't fix them at whim. Andrei wrote that he was fine with the behaviour as-is, given that it throws a catchable fatal error.

Short version: It's still a catchable fatal error.

NEW: PHP 5.2.1 RC2

Ilia Alshanetsky, as Release Master for the PHP 5.2 series, announced the second release candidate for PHP 5.2.1 as follows:

Since the last release over 30 different bug fixes were made and the
two pending patches relating to is_numeric_string() optimization and
internal heap protection for the Zend Engine allocator were added.
Another important change was the fact that the memory limit is now
always enabled and to accommodate this change the default limit value
was raised to 128 megabytes to avoid script breakage. We do not
anticipate any regressions to be introduced by this RC, but I would
still like to ask everyone to take a few minutes and test it against
their code base. If you come across any issues please report them at
http://bugs.php.net/.

With this release we are nearing the final stretch is the release cycle,
so I'd like to ask all developers to refrain from making any commits to
the 5.2 tree that are not bug fixes. If all goes well the final RC (RC3)
will be available in 2 weeks to be shortly followed by the final release.

The source tarballs can be found at:

http://downloads.php.net/ilia/php-5.2.1RC2.tar.bz2
 (md5: cc6024531e3d4058e32cf740e2fe535f)
http://downloads.php.net/ilia/php-5.2.1RC2.tar.gz
 (md5: 3f89c31687762a39f1360b380dd315b4)

and the Window binaries made available by Edin Kadribasic at:

http://downloads.php.net/edink/php-5.2.1RC2-Win32.zip
 (md5: aaabe7eda5cef1be6f9c51c621fbbfd3)
http://downloads.php.net/edink/pecl-5.2.1RC2-Win32.zip
 (md5: 6f97b2365bfa56874801fa5053d387e0)
http://downloads.php.net/edink/php-debug-pack-5.2.1RC2-Win32.zip
 (md5: 592f2fc11c9ed4d891fb17b9e967441c)

PHP user Jan Schneider was quick to complain that ext/ming doesn't compile against this release candidate, and presented a page full of compiler errors detailing every new function and constant Frank Kromann added support for recently. Nuno Lopes wrote that the ming extension currently neither compiles with older versions of libming nor prevents the attempt... at the very least, the required libming version needed updating. Frank evidently wasn't around to see either that mail or the one Nuno had sent highlighting the problem earlier, so Nuno took the liberty of fixing the issue himself later in the week and the extension now supports older versions.

Both Firman Wandayandi and Lukas Smith reported problems with loading extensions under Apache on Windows - of which, more later.

Short version: Problems with win32 in this release candidate - read on.

NEW: PHP 4.4.5 RC1

Wearing his PHP 4.4 series Release Master hat, Derick announced that the first release candidate for PHP 4.4.5 is now available for testing:

Please test it carefully, and report any bugs in the bug system, but
only if you have a short reproducable test case.

If everything goes well, we will probably release PHP 4.4.5 before the
end of the month.

http://downloads.php.net/derick/php-4.4.5RC1.tar.bz2
 (md5: ee9238175c6b6ecec8712954065451c4)

http://downloads.php.net/derick/php-4.4.5RC1.tar.gz
 (md5: 5337c72e3d70fb88b932215957e250f8)

Edin's binaries for Windows are available at:

http://downloads.php.net/edink/php-4.4.5RC1-Win32.zip

Christian Schneider, running make test for the first time, wondered what he should do with results showing that some bugs had been fixed in PHP 5 but not in the PHP_4_4 branch? Two of the bugs in question being from ext/gd, Pierre explained that he no longer automatically applies patches to the PHP_4_4 branch; any requests for GD fixes there should go to Derick.

Short version: A fairly smooth RC roll-out. Please test.

TLK: FastCGI, CLI optimized for win32

Edin, meanwhile, had decided to experiment with dropping thread safety in non-threaded SAPIs such as CGI/FastCGI and CLI. With his initial benchmarks showing a 20-30% performance increase for PHP 5.2.1 RC2, he released the binaries for testing:

http://downloads.php.net/edink/php-5.2.1RC2-nts-Win32.zip
 (76aa90a7fdb0bd2eb62c1172501d6c6e)
http://downloads.php.net/edink/pecl-5.2.1RC2-nts-Win32.zip
 (a493bdf794a5d44d749f6dcd2a55f9da)
http://downloads.php.net/edink/php-debug-pack-5.2.1RC2-nts-Win32.zip
 (cbfd474fcdb61522d4c750b5c02d3df9)

Vincent Dupont wanted to know which version(s) of IIS/win32 should use these binaries? Edin looked into it, and came back with the cheering information that Microsoft have released their native FastCGI module - built-in in IIS 7 - for IIS 5.x and up.

Short version: A major speedup for FastCGI under Windows.

TLK: The new Windows build

Following all that heady success, Edin found himself unable to get Microsoft Visual Studio 2005 to compile a PHP server module capable of loading PHP extensions. Having looked around other open source projects in search of clues, he'd discovered that pretty much everyone else, including Apache, was still using MSVC++ 6.0 for their builds, thereby eliminating 'all the hassle with bundling C runtime etc'. Edin thought perhaps PHP should do the same. Ilia agreed that it was too late in the release cycle to start experimenting, and suggested revisiting the issue at a less vital juncture - assuming there was some quantifiable benefit in using more recent versions of MSVC++ in the first place. Apache developer William A. Rowe agreed that this was a sensible approach, but suggested the problem could lie in incorrect .manifest data packaging - possibly because individual extensions require an entry for php5ts.dll, but he had a few questions for Edin before he could be certain of that. For Ilia, William added that the 'quantifiable benefit' of using VC++8 is that it's a freely available download with a more familiar interface than most open source compilers can offer Windows users. Edin pointed out (mainly to Marcus, who was pushing for the newest possible compiler version) that the problem with upgrading the PHP compiler is mainly that Apache's compiled with VC++6.

Andi Gutmans wrote that there are significant performance improvements to be gained from using VC++8, but he agreed with the general consensus that the upgrade should be well timed. He offered Edin help to 'get this puppy ported and update the build', suspecting that third party libraries were to blame for the problems. Edin explained that the problem about Apache binaries being compiled with VC++6 is that they load the wrong C runtime. The only solution he'd found to date was to supply apache.exe.manifest in Apache's bin directory and use it to force Windows to load Apache with msvcr80.dll as a dependency rather than msvcrt.dll. That said, Edin figured 'it probably isn't good idea to force a different C runtime on Apache like this'. He added, almost as an aside, that the performance increase using Zend/bench.php was miniscule compared with that to be gained by disabling thread support. This surprised Andi, who promptly shared his own results, although he later noted that these had been gained by utilizing profile-guided optimizations in non-threaded builds.

William explained why the Apache team aren't rushing to upgrade their compiler. Firstly, the server is intended to be interoperable with ActivePerl and ActivePython, both of which are compiled under MSVC++ 6.0; and secondly, systems without the VC80 C runtime (i.e. pre-XP) wouldn't be able to use it. There are Apache VC80 binaries available, but they aren't part of the official distribution, and an upgrade is highly unlikely to be considered until at least Apache 2.4 due to binary compatibility concerns.

Edin meanwhile had decided it was far less of a headache to stay with good old VC6 and - with his CPU 'still smoking in the corner' - announced the availability of the PHP 5.2.1 RC2 binaries, thread-safe (TS) and otherwise (NTS - for FastCGI/CLI optimization). He added a warning that the NTS binaries are not compatible with any third party DLL available on the net.

Wez Furlong appeared out of nowhere to suggest that the manifest for mod_php*.dll might need altering to make the correct CRT load. He believed it had something to do with the resource number used, but wasn't absolutely certain of that. Edin was fairly sure that his existing code should embed the manifest. That said, he still couldn't see why it might work fine for the executables but not for the loadable SAPI modules, much less why it should appear to work fine right up until PHP attempts to load an extension. Rob Richards reported that his own experiment with MSVS 2005 and Apache 2.0.59 had been fine, but changing the Apache configuration to Edin's setup and then trying to load PDO broke it. He'd also tried to run nmake test, which had failed in the VS8 build (only) because it broke proc_open(), used in the test suite script. Then again, he was running his tests on a dual core machine, which might also have some bearing on the proc_open() issue... Wez wrote that he'd seen similar problems in the past when attempting to distribute modules that use the debug version of the CRT, itself not redistributable. He wondered if that was somehow getting into Edin's release builds?

Rob came back later with the news that the modules themselves were the problem in Edin's build - they had no manifest embedded. He thought the CGI/CLI executables were probably working despite this because they load msvcr8.dll (the C runtime) directly from the host system (where it exists), so don't actually need a manifest to provide it.

Short version: It's going to be a while before Edin's happy to distribute binaries compiled with VC++8.

CVS: It's all in the HEAD

Changes in CVS that you should probably be aware of include:

  • Core bug #40009 (http_build_query(array()) returns NULL) was fixed [Ilia]
  • Following PHP 5.2.1 RC2, extension bugs #39979 (PGSQL_CONNECT_FORCE_NEW causes next connect to establish a new connection) and the marginally less challenging #39394 (Missing check for older variants of openssl) were fixed in the 5_2 branch (only) [Ilia]
  • XMLWriter bug #39504 (xmlwriter_write_dtd_entity() creates Attlist tag, not entity) was fixed [Hannes Magnusson]
  • SPL bug #40036 (empty() does not work correctly with ArrayObject when using ARRAY_AS_PROPS) was fixed [Ilia]

Dmitry Stogov had been looking into proc_open(), and committed an improvement allowing the function to run external commands on win32 systems without calling cmd.exe. Nuno Lopes wondered whether it would be possible to drop the need for the shell under Unix-like systems too? Dmitry - who had had similar thoughts - explained that it would be too complex a change so late in the release cycle, with a high risk of introducing new bugs. He hoped to make time for it after the PHP 5.2.1 release.

Marcus started 2007 with a commit to 'make Andrei happy', marking the low-hanging fruit in ext/spl as Unicode-friendly.

The rest of the week definitely belonged to Sara Golemon. The session support got a good old-fashioned code cleanup 'so that I can do a Unicode update without going insane'. Whitespace and brackets having been duly inserted in the prescribed manner, the actual upgrade came through a few hours later. Sara followed this up by allowing ext/session to use the algorithms from ext/hash, when available, to generate session IDs. She then turned her attention to the Zend Engine. The PHP userspace function create_function() is now Unicode-aware as a result, and for internals there are four new persistent memory allocation functions (peustrdup(),peustrndup(), pezstrndup(), pestrndup()) and the 'plain vanilla' zend_ustrdup() in the Zend API. Sara went on to add and expose add_property_zstr[l]_ex() there before going back to her roots; the userland stream_*() filter functions are now well and truly 'Unicodified'.

Short version: Take a look at proc_open() in PHP_5_2 branch. It's probably the first function in the history of PHP development to perform best on Windows.

PAT: OpenSSL improvements

One John Bafford turned up with a patch and corresponding test file introducing three new functions into the PHP core:

mixed array_key_index(array input, int index [, mixed value])
// Return the array's index'th key (and optionally, value)

mixed array_first(array input [, mixed value])
// Return the array's first key (and optionally, value)
// This is equivalent to array_key_index($input, 0 [, $value]);

mixed array_last(array input [, mixed value])
// Return the array's last key (and optionally, value)
// This is equivalent to array_key_index($input, -1 [, $value]);

He hoped to see these in PHP 5.2.2.

Sara Golemon was in favour, but pointed out that new functionality usually is limited to minor versions rather than patch level releases; John's patch would need to wait until either PHP 6.0 or PHP 5.3.0. Andrei and Tony were less impressed; they both argued that the functionality, being very simple to implement in userspace, doesn't warrant the cost of their upkeep in the core. Brian Moon and Arnold Daniels both argued for the patch, which is now being held in the PAT directory pending wider review.

Moritz Bechler, who had been working on a separate extension offering OpenSSL CRL support, offered a patch to integrate CRL support into ext/openssl instead. He wrote that applying his patch alongside Marc Delling's PKCS12 patch (currently in PAT) would make it possible to write 'basic but real' PKI-CA applications in PHP. Although recognizing that this was a minority concern, Moritz felt that the same could be said of most of the openssl extension beyond the stream wrapper, and mentioned the possibility of splitting it up.

Pierre replied that Marc's patch is 'in the queue' and that Moritz should open a feature request for his own. He personally felt there was a need for fuller OpenSSL support, and planned to add pending patches to the next PHP 5.x development branch and CVS HEAD following the 5.2.1 release. Pierre thanked Moritz for his patch, which has now been posted as feature request #40046.

Two bugs, #40012 (php_date.c doesnt compile on Netware) and #37619 (proc_open() closes stdin on fork() failure) were fixed when Derick and Nuno respectively applied patches supplied by the reporters of those bugs. The latter was fixed in the PHP_5_2 branch only.

Short version: The OpenSSL extension is about to gain a great deal.