TLK: The accidental death of --disable-all
REQ: 5.2 serialization change [continued]
TLK: INI includes
RFC: Moving core extensions to PECL
TLK: PHP 5.2.1 status
RFC: unset($object)
CVS: Ming API upgrade, PDO in HEAD
PAT: make test -n, CV experiments

TLK: The accidental death of --disable-all

It was one of those weeks. First there were a bundle of threads completely unrelated to PHP core development, and then there was Andrey Hristov's funny moment. Andrey had decided to disable a few extensions to speed up his development efforts, and had discovered that --disable-all was 'kind of borked' in CVS HEAD. PHP wouldn't build, complaining about Unicode stuff. OK, so Andrey went to disable extensions on an individual basis, starting with the XML support:

'--disable-xml'
'--disable-libxml'
'--disable-xmlreader'
'--disable-xmlwriter'
'--disable-dom'

SPL suddenly started complaining, and PHP still wouldn't build:

'--disable-spl'

Now he couldn't build ext/standard, because count() relies on SPL. Maybe it's time to make disabling SPL a non-option... Further experimentation showed that the CLI SAPI can't build if ext/reflection is disabled, either, and at that point Andrey gave up and mailed the list.

Ilia Alshanetsky wrote bemusedly that there is absolutely no reason for count() to require SPL. Besides, it should definitely be possible to compile PHP with --disable-all and no other flags, perhaps with the exception of ICU in CVS HEAD 'for obvious reasons'. Andrey posted the error message that had prompted him to mention count() in the first place:

ext/standard/array.o(.text+0x6c2): In function `zif_count':
/home/andrey/dev/6/ext/standard/array.c:316: undefined reference to `spl_ce_Countable'

'make clean!' chorused Hannes Magnusson and Tony Dovgal in unison.

Short version: Sometimes the biggest problems are the simplest to fix.

REQ: 5.2 serialization change [continued]

Andrei Zmievski realized that his patch had zero chance of getting into the PHP_5_2 branch if it broke back compatibility. He wrote sadly that he couldn't see a way to make decoding work for all cases in both CVS HEAD and PHP 5; 'I guess we'll have to leave this task to PHP_Compat'. Thomas Seifert asked if he was kidding. Why would he even think it might be OK to break all the strings stored as serialized in the PHP 5.x series? He could understand that kind of breakage in PHP 6, but 'not in some minor release'... Robert Cummings stepped up to explain to Thomas (and anyone else who'd missed Ilia's rejection of the patch) that this is a forward compatibility issue. If people don't mind not being able to read serialized data from future PHP versions in 5.2 and over, there is no BC problem. He proposed checking the start of the serialized string for a version indicator prior to decoding. If no version indicator is found, unserialize() should fall back to the original semantics.

Stefan Esser had a better idea. He suggested introducing a new variable type, S, for serialized strings. Zeev Suraski pondered that for a while before responding that he actually couldn't think of a single reason why not. Could anyone else? Ilia could; he pointed out that older versions of PHP would be totally incapable of parsing a PHP 6 serialized string, creating problems for anyone using serialization as a means of passing data between applications. He thought the problem would be worse if the new type were added to a pre-6.0 release. Stefan replied that the idea was to enable PHP 5.2.x to read data serialized with PHP 6, not to enable PHP 5.2.x to generate it.

Andrei fired off a haughty note to Thomas explaining that the BC break would occur if you used PHP 6 serialized content in PHP 5 - not the other way around. It isn't possible to use the same escape format in PHP 5.2, for BC reasons; the support would need to come from PHP_Compat, as he'd originally noted.

PHP user Chad Daelhousen had a plan, too. He wanted a version indicator at the start of strings serialized in PHP 6 as per Robert's suggestion; no changes to serialize() in earlier PHP versions; an optional flag for serialize() in PHP 6 to force old-style serialization; and unserialize() in PHP 5.2.x to understand the PHP 6 version indicator.

Short version: Everybody wants to save the world.

TLK: INI includes

Brian Shire had made a patch to support INI includes before he came across a prior discussion of them in the MARC archives. He wrote that INI includes would be a great help in simplifying large configurations, particularly when using a configuration management tool. Was this still of interest to the current PHP team?

Mathieu Carbonneaux wrote that he'd made a similar patch making it possible to modify the configuration of scandir for additional INI files. He found the --with-config-file-scan-dir configure option very useful, but limited. His solution, on the other hand, allowed him to dynamically modify the directories to be scanned or to add an include parameter in php.ini.

Short version: Funnily enough, John Mertic was hoping for this for the Windows installer too...

RFC: Moving core extensions to PECL

Ilia mailed the internals list to sound out his idea about moving the com_dotnet, mhash and sockets extensions to PECL for the PHP 5.2.0 and PHP 6 releases. His reasons for picking on these three were explained in some detail.

The com_dotnet extension (Ilia called it 'COM') has no maintainer at present and a high number of bug reports open against it - many of which are crashes. It's a Windows-only extension; PECL Windows binaries are readily available, and most Windows users don't compile the extension from source anyway. A move to PECL would allow for an independent release cycle, which would make it possible to deploy any fixes quickly; and Ilia hoped it might also encourage individuals or companies to take an interest in maintaining the extension.

The mhash extension has been superceded by ext/hash, which is enabled by default and requires no external libraries. Similarly, the sockets extension - which is unmaintained - has largely been superceded by the more consistent and stable streams API. This much said, Ilia opened the floodgates by asking people for their thoughts on the matter.

Straight away, the difference between *nix-only developers and those who occasionally code under win32 showed up. Sebastian Bergmann and Tony both agreed with Ilia on all counts, but Stas Malyshev wrote point-blank that com_dotnet is an incredibly useful extension when running PHP under Windows. He didn't see that moving it to PECL would improve the bug count any, and it doesn't cost much to keep it in the core. Ilia argued that com_dotnet isn't enabled by default under Windows anyway - so how could it make any difference whether it stays in the core or moves to PECL? Stas - and pretty much everybody else who ever used Windows - argued that not only is it both built-in and enabled by default, it also happens to provide core win32 functionality. Pierre-Alain Joye and Frank Kromann both wrote that they were prepared to do what they could about the bug reports mounting up there. Wez Furlong arrived on the scene to explain that, although he no longer has time to maintain the extension, he still has a vested interest in it and would be happy to review any patches. He suspected the bug reports would prove to be 'mostly duplicates' (now there's an interesting phrase) and wrote that it works fine for most of its users, most of the time. Wez wasn't aware of the extension being critically flawed to the point where it should be removed from the core. Besides which, 'it's a bit like suggesting that we make the exec family of functions an optional download. In theory, it sounds like a great idea, in practice, it's a pain in the ass and will leave people wondering what you were smoking :-)'. Marcus Börger was simply concerned over maintenance; he didn't mind keeping ext/com_dotnet so long as somebody was willing to take responsibility for it. Faced with the evidence - all of which was fully backed by the majority of the core dev team - Ilia dropped the idea of moving com_dotnet, crossing his fingers that somebody would find time to go through its bug reports. On to the rest...

Stas had nothing against moving ext/mhash, except that ext/hash needed better coverage in the manual beforehand (to which, Ilia agreed). He felt that the sockets extension probably shouldn't be moved in a minor PHP version; people use it in applications, and would need to rewrite their code to a significant degree to migrate those apps to use streams instead. Ilia agreed, but pointed out that there is nobody to support the extension's users; they could be leading themselves to a dead end if they started using ext/sockets in its final days. Stas suggested putting a notice in the manual to say it would be leaving the core in PHP 6 and recommending streams instead. His chief concern was existing code, rather than new users. Derick Rethans and Pierre both expressed similar concerns over ext/sockets. Frank thought the extension shouldn't go to PECL until everything it handles can be handled using the streams API. Mike Wallner simply wrote '+1 for PHP 6' against ext/sockets.

Johannes Schlüter - the only developer to show any sign of caring about ext/mhash - didn't think any extension should be removed from the core 'on a minor release'. He'd like to mark them as deprecated instead, to give the users a chance to see the warnings and update their code beforehand. Ilia agreed that it would be wrong to move anything as part of a patch release, and promised not to move anything before PHP 5.3.0. It would be nice, he added, if someone from the docs team could add a notice to the ext/mhash and ext/sockets pages in the PHP Manual indicating that those two extensions wouldn't be core for much longer.

Short version: The lack of concern over ext/mhash is a measure of the success of ext/hash.

TLK: PHP 5.2.1 status

Ilia, wearing his PHP 5.2 series Release Master hat, made an announcement regarding his plans for the PHP 5.2.1 release:

Just a quick notice to everyone that I'll be making RC1 of 5.2.1 on
Thursday (November 14th), after which only bug fixes will be allowed
into the tree. So, if you have any major commits pending, now is the
time to make them. This will be the only RC this year and will be
followed by RC2 in the first week of January. If all goes well expect
the PHP 5.2.1 final release in late January.

Short version: He meant December.

RFC: unset($object)

Following on from a brief on-list discussion between PHP users about the difficulty of destroying cross-referenced objects, Sebastian Bergmann posted an RFC suggesting internal support for this. Having proved that - as Arnold Daniels had found earlier - calling unset() on a parent object does nothing because the child still references it, Sebastian suggested a 'magic' method that would automatically be called when unset() is called on an object that implements it, allowing the user to do something like this (given that the name __unset() is already taken):

public function __new_magic_method() {
    
$this->children = array();
}

Sebastian asked whether anyone was aware of any solution to the problem other than explicitly unsetting the $children array, adding that even if one exists he believed the proposed method would be useful.

Etienne Kneuss suggested using references:

function &setup() {
    
$parent = new ParentC;
    
$child = new Child;
    
$parent->children[] = &$child;
    
$child->parentO = &$parent;
    return
$parent;
}

$parent = &setup();
$parent = null;

echo
"--end-- ";

Sara Golemon got involved. She wrote that the code would provide better analysis if there were notification of the current state of both relevant reference counts:

class foo {
    public function
__delref($objectstore_refcount , $zval_refcount) {
        if (
$objectstore_refcount == 1) {
            
/* Only one zval is pointing to this object, i.e. the one used/shared by the child's backreference */
             
if ($zval_refcount == 1) {
                
/* Only one variable is pointing to this zval, i.e the child's backreference property */
                
$this->children = array();
            }
        }
    }
}

That code assumes there is only one child. Multiple children will 'probably' share a single object store reference through a common zval with multiple references:

class foo {
    public function
__delref($objectstore_refcount , $zval_refcount) {
        if (
$objectstore_refcount == 1) {
            if (
$zval_refcount == count($this->children)) {
                
/* Only one children's properties are pointing to this zval */
                
$this->children = array();
            }
        }
    }
}

However, 'that still doesn't cover the case where there are multiple object store references spread out among the children (and possibly other variables not contained in the object itself)'. It is not possible to aggregate the total number of variable->zval->object references at this point; only the zval in the process of being dereferenced is known. Sara added that this is part of the reason PHP doesn't have delete().

Sebastian failed to see why anyone would need to know the current state of the reference counts; if his new magic method existed there should be no problem with unsetting the $children property:

unset($parent) -> "magic method" -> $children = array() -> no references to $child objects

although he recognized that nothing would happen where other references to the $child object exist.

Arnold Daniels suggested hunting down circular references on unset(). Although noting that 'this might be bad for performance', he felt that memory leak prevention was the responsibility of the language rather than that of the user. When an object is destroyed because its reference count is 0, unset() should be called for all its children; unset() should also be called on a variable leaving the call stack unless destroy() was. Ants Aasma reckoned Arnold's solution would definitely be too slow; most of the other languages he knows only check visibility periodically.

Matthias Pigulla wrote that the issue of cross-referenced object destruction comes up regularly on PHP user mailing lists and occasional bug reports; while workarounds are possible in userspace code, they are 'painful and messy'. If the only problem with Arnold's solution is that detection is too slow, would it be possible to add a userspace function called something like gc_cleanup() to perform the scan? Those using PHP in a request/response environment wouldn't need it because everything is freed at the end of a request, but users working in other environments could find a good place to make the call, and are unlikely to complain about the delay.

Short version: A problem without an owner.

CVS: Ming API upgrade, PDO in HEAD

Changes in CVS that you should probably be aware of include:

  • Ext/pdo_mysql now defaults to use buffered queries and prepared statement emulation [Ilia]
  • PDO bugs #39483 (Problem with handling of char in prepared statements), #38252 (Incorrect PDO error message on invalid default fetch mode), #38319 (Remove bogus warnings from persistent PDO connections) and #36798 (Error parsing named parameters with queries containing high-ascii chars) were fixed [Ilia]
  • In ext/session, bug #37627 (session.save_path check checks the parent directory) was fixed [Ilia]
  • Zend Engine bugs #38274 (Memlimit fatal error sent to "wrong" stderr when using fastcgi), #39721 (Runtime inheritance causes data corruption) and #39775 ("Indirect modification ..." message is not shown) were fixed [Dmitry Stogov]
  • tolower() related functions were improved in VC2005 builds by caching locale and using tolower_l() - giving a 10-18% speedup in benchmark tests [Stas]
  • In ext/openssl, bug #39571 (fsockopen() timeout param does not affect ssl/tls handshake) was fixed [Ilia]
  • In ext/xsl, bug #39625 (Apache crashes on importStylesheet call) was fixed [Rob Richards]
  • POSIX bug #39754 (Some POSIX extension functions not thread safe) was fixed [Ilia]
  • In ext/pdo_mysql, bug #39759 (Can't use stored procedures fetching multiple result sets) was fixed [Ilia]
  • In ext/oci8, bug #39732 (oci_bind_array_by_name doesn't work on Solaris 64-bit) was fixed [Tony]
  • Heap corruption when adding/caching typelib in ext/com_dotnet was fixed, closing bug #39606 [Rob]
  • Ancient safe_mode bug #29840 (is_executable() does not honor safe_mode_exec_dir setting) was fixed [Ilia]
  • Hartmut Holzgraefe's new-ish function, sys_get_temp_dir(), was backported to the PHP_5_2 branch [Hannes]
  • In ext/gd, bug #39780 (PNG image with CRC/data error raises fatal error) was fixed [Pierre]

CVS HEAD perked up a little this week. Andrei continued implementing Unicode support for myriad core functions, including var_export(), http_build_query(), parse_url(), dl() and (in ext/date) strptime() which, along with version_compare(), uses runtime encoding for conversion. He made headers-related functions accept Unicode strings ('but only if their contents can be converted to ASCII'), and made the iptc* family of functions Unicode safe. The latter, however, went into CVS HEAD untested, 'cause I know crap about IPTC'. Sara joined the party, supplying a Unicode upgrade for fgetcsv() and adding str_getcsv() before Andrei challenged her over the portability of her code. She subsequently changed fgets() behaviour to be back compatible again.

Ilia meanwhile worked his way steadily through ext/curl. Andrei wanted to know how he was dealing with POSTFIELDS, commenting that they would need to be able to cope with Unicode content. Ilia agreed that they present a problem, mainly because the first character needed checking for @ (signifying a file upload). Perhaps data could be posted as UTF-8, but the form might not be expecting that, and there was no way to be certain what the form actually did expect. Posting binary as-is, and Unicode as UTF-8, might resolve that issue, but he had concerns over possible side effects. Ilia subsequently applied a fix to allow the submission of Unicode data in UTF-8 form.

Tony's work on ext/oci8 appeared to be running along smoothly, with oci_statement_type() covered explicitly and 'most of the OCI8 functions' now marked as Unicode aware. Rob seemed equally on top of the large and sprawling DOM extension, and marked a vast array of its functions as Unicode safe in one huge commit.

Frank Kromann updated the ext/ming API in PHP_5_2 and CVS HEAD, bringing several new PHP methods to life in the process. Ming users now have access to the SWFVideoStream methods init(), setDimention() [sic] and getNumFrames(); swfprebuiltclip::init(); SWFMovie::namedAnchor() and SWFMovie::protect(); and a new function, ming_setSWFCompression(). There are also two new SWFTextField constants, SWFTEXTFIELD_USEFONT and SWFTEXTFIELD_AUTOSIZE, and several for SWFSound: SWF_SOUND_NOT|ADPCM|MP3|NELLY_COMPRESSED, SWF_SOUND_NOT_COMPRESSED_LE, SWF_SOUND_5|11|22|44_KHZ, SWF_SOUND_8|16_BITS and SWF_SOUND_MONO|STEREO.

Tony committed a patch introducing a BSD licensed implementation of double-to-string utilities to replace the previous LGPL'd version. He noted that the change also fixes thread safety issues in zend_strtod(). Matt Wilmas noticed some changes to formatted_print.c in there. He had a vested interest in the file, having had a patch for it since August. He wrote that Tony had added the specifiers g/G and E - part of his own patch - but missed the F specifier, although it was still present in php_formatted_print(). Matt was prepared to accept that the locale decimal point might be handled differently now, but thought F should be in there for back compatibility. Tony promptly added the F specifier, and took a look at Matt's old patch. He couldn't see any obvious problems with it, but wrote that he'd need to play with it before he was certain enough to commit it.

Stefan kicked up a fuss following a commit from Ilia to fix bug #39763 (magic quotes are applied twice by ext/filter in parse_str()) in PHP_5_2. He wrote that a comment suggesting php_register_variable_safe() was responsible for adding the magic_quotes slashes was simply wrong; ext/filter does that job now. Further, a previous commit there by Tony had broken magic_quotes_gpc completely, introducing potential SQL injection vulnerabilities. Stefan wasn't entirely happy with the filter extension anyway; he thought it should be rewritten to support daisy chaining, work as a filter rather than registering the variables itself, and support cookies 'correctly'.

Ilia survived the onslaught. He wrote that Stefan was wrong about the comment in the first place; php_register_variable_safe() does indeed put the slashes there. That was precisely why Tony's earlier changes had been correct; magic quotes shouldn't be applied for PARSE_STRING() because the slash-adding function would be executed on the returned value. Ilia thought daisy chaining should be supported by providing hooks and having them call the stock filter functions. The variable registration is as it is in ext/filter because it makes the API simpler, and he didn't know of any really good reason to have it any other way. However, he was curious over Stefan's mention of incorrect cookie support, and asked him to elaborate, Stefan being a bit of an expert in that area.

Stefan looked again at the code, and agreed about the slashes. However, this simply illustrated the need for variable registration to be kept separate from filtering. As for hooks, although he had no problem with providing his own, it made no sense to him to have input filter hooks in PHP 5.1 and then find them unusable in PHP 5.2, where ext/filter takes them over. In fact, that abuse of the input filtering hooks was the main reason he could see for moving variable registration to a single place; several codepaths currently lead to different results. In fact, wrote Stefan, if ext/filter didn't insist on registering variables itself, the bug Ilia had just fixed would never have arisen. As for the cookie issue, Stefan gave clue. Somewhere down the line, php_register_variable_ex() had been changed to handle cookies differently from other variables; cookies with the same name would be dropped after the first was registered. In ext/filter RAW this (correct) behaviour still stood, but the filtered variables behaved differently...

Pierre retorted that if it were possible to work around the filter extension, filter itself would make no sense. magic_quotes had known issues, which was why it was set to disappear in PHP 6 anyway. As for the business over the cookie, if Stefan had found a bug he should report it at bugs.php.net in the normal way; it would make it easier for the team to track it. Stefan replied loftily that his bug reports weren't wanted at bugs.php.net because he uses a patched version of PHP; he was therefore not submitting any more bug reports. However, Pierre should feel free to submit one of his own.

Ilia wrote that actually, daisy chaining wasn't a bad idea; it's just that nobody had expressed the need for it before. The variable registration, though, is as it is largely in an attempt to reduce memory usage. He suspected there might be a better solution to prevent duplication of data... Ilia concluded his post with a promise to look into the cookie business, and subsequently did. He found he'd fixed the discrepancy himself earlier in the week, in the course of fixing an entirely different bug.

Wez wound up the week with a surprise move when he merged PDO (but not the drivers, yet) from the PHP_5_2 branch to CVS HEAD. He noted in his commit message that the source currently compiles against both PHP 5 and PHP 6, and asked that anyone 'poking around in here' ensures it stays that way.

Short version [thanks Andrei]: Looking forward to next installment of "PDO in Unicode Land!

PAT: make test -n, CV experiments

Marcus took a look at the make test -n flag patch used by the Gentoo PHP distro, and wrote to Luca Longinotti that the PHP team couldn't use it. It would prevent shared extensions being loaded, and they couldn't use dl() in their test scripts. The only workable approach would be to have a new make call that passes -n to the test script, because the whole point was to test the entire PHP deployment rather than just the core. That said, there is already an environmental variable TEST_PHP_ARGS that serves the purpose; you could set TEST_PHP_ARGS=-n and the problem would be gone.

Brian Shire wanted to know whether Marcus envisaged a second version of make test that included the -n option, whether -n should be included in the default make test, whether a completely different solution was needed, or whether he should drop the whole idea? He'd thought of a more complex solution but not explored it yet; it might be possible to ignore duplicate extensions during make test. However, this would mean complicating the CLI SAPI for a single, specific scenario. Brian went on to say that he hadn't known TEST_PHP_ARGS existed, and wondered whether it might be sensible to have shared extensions set it? Marcus, however, wasn't convinced that a solution was even needed. It seemed to him more of a documentation issue.

Later in the week, Brian provided another patch optimizing the extract() function by eliminating unnecessary calls to strlen(). Ilia, calling it a 'good catch', promptly applied the patch in the PHP_5_2 branch and CVS HEAD.

Alexey Zakhlestin, who had been looking into the problem of persistent memory management, believed he'd sorted out most of the issues. He wanted to know if there was an 'official' way to deep copy a non-persistent zval into a persistent zval, given that zval_copy_ctor() doesn't have a flag for this. If not, he could see a need for

zval_pcopy_ctor(zval *value, zend_bool persistent)

and

#define zval_copy_ctor(v) zval_pcopy_ctor(v, 0)

similar to the *alloc() functions.

Sara had been idly toying with compiled variables. As an experiment, she'd applied them to pi(), and had seen a consistent gain of approximately 18% using the simple test:

for ($i = 0; $i < 10000000; $i++) pi();

Although recognizing that the test is not normal usage, and that her patch doesn't address dynamic function calls, method calls or class resolution, Sara felt the results probably meant this was a good time to start discussing CV in functions.

Ilia was interested, but doubted whether the overall speed improvement would be anywhere near so high. That said, it would certainly make PHP faster than before. Sara agreed entirely; reality, as she'd already mentioned, is not reflected in her test script. Also, her figures were based on unicode.semantics=off. She thought optimized class fetches would probably help in several cases too, but wasn't prepared to put in the work for that, 'unless it sounds worth doing to enough people'. Ilia pointed out that optimized class fetches would only be useful for native classes.

Dmitry called it an 'interesting patch', and wrote that the Zend team had had a very similar idea in the past. He showed Sara that it's possible to optimize function calls much more by optimizing ZEND_INIT_FCALL_BY_NAME as well as ZEND_DO_FCALL, and that the same cached entries can be reused for all the op_arrays from a given PHP file. He offered to send a demo patch in the next few days. Sara replied that she'd only presented this as a rough idea, to spark discussion. She agreed that there was a lot that could be done to improve it, both in terms of coverage and in the actual implementation. Making the cache per-file rather than per-scope was definitely a good idea, but she hadn't found a simple way to track the current file. All in all, she looked forward to seeing Dmitry's patch.

One Kevin Hoffman offered up perhaps the tidiest bug report and patch ever seen on these lists, for bug #39751 (putenv causes string copy of freed memory region, causing crash). Edin promptly applied Kevin's fix.

Tony applied Matt's zend_u_strtod() implementation mid-week, announcing it in the commit message as a 'major speedup when using floats in Unicode mode, also fixing several problems with the current code'. Matt himself, however, had moved on to pastures new; he posted a patch for zend_u_strtol() and HANDLE_U_NUMERIC() allowing only ASCII digits and sign characters. His initial tests showed all was well, and the change brought a performance increase too.

Short version: Compiled variables could be a way forward.