Categories


Loading feed
Loading feed

Zend Weekly Summaries Issue #282


TLK: pg_execute_error
TLK: FastCGI and STDIN
REQ: Arbitrary precision datatype
TLK: Late static binding
NEW: PHP 5.1.3 RC2
TLK: GD development @ php.net
BUG: __set/ __get behaviour
REQ: Zend API bump
TLK: PHP-GTK corner
CVS: Mostly Unicode
PAT: Quiet week

TLK: pg_execute_error

Unusually, ext/pgsql maintainer Yasuo Ohgaki mailed the internals list for advice on how to deal with a perennial PostgreSQL issue. The story was that pgsql.c currently has support for prepared statements, but pg_execute() raises an E_WARNING if the query plan has not been prepared before the function is called. Yasuo termed this 'annoying', particularly when the database connection is persistent in the Web environment.

He hoped to be able to work with PHP code such as:

if (!pg_execute($db, 'myquery', array()) {
    
pg_prepare($db, 'myquery', 'SQL');
    
pg_execute($db, 'myquery', array();
}

without raising the warning at all. Yasuo could see four possible approaches to the problem:

  1. ignore errors from pg_execute()- identify them via the return status
  2. add pg_is_prepared()
  3. add an optional boolean parameter to pg_execute(), e.g.
    pg_execute(resource connection, string stmtname, array params, bool ignore_error)
  4. ignore pg_execute() errors only if the params array is NULL

Did anyone have any comments?

Lukas Smith wanted to know how Yasuo planned to implement option 3 on that list? As far as Lukas knew, there is no way to discover prepared statements in the current session, although he believed this is slated to change in PostgreSQL 8.2. The only way to find out whether a statement has been prepared is simply to try it, and anticipate an error on failure. That error could trigger 'all sorts of error handlers' on the database side, producing unexpected behaviour in any PHP function that had an ignore_error parameter switched to TRUE.

Yasuo, agreeing that this was exactly the issue, went ahead and applied a fix. He replied to the list mail saying that he'd simply killed the E_WARNING in pg_execute(); it seemed the most efficient way to deal with the problem. An alarmed Lukas fired off another email to clarify: calling pg_execute() on a unprepared statement will cause the transaction to be rolled back on the next commit. Encouraging the use of pg_execute() to find out whether the statement has been prepared is, therefore, 'simply wrong'. The appropriate place to address the issue would be in userspace code, by using error suppression in the error handler.

Wez Furlong and Edin Kadribasic were both quick to back Lukas' position, and called for Yasuo to revert his patch. Yasuo argued that it was as though file_exists() were to raise an E_ERROR on failure; PostgreSQL doesn't provide a means to check whether a plan is already defined, so the script developer can't design around it. Users are advised to prepare the statement before getting into the transaction block, and check the return status of pg_execute()... how about if he disabled the E_WARNING, but allowed other errors to be caught? Would that be okay?

Lukas reiterated his view that PostgreSQL users should use @pg_execute() and then put something in the error handler like:

function ErrorHandler($errno, $errstr, $errfile, $errline) {
    
// ignore silenced function calls
    
if (!error_reporting()) {
        return;
     }

Admittedly this was a bit of a hack; the 'beautiful' alternative would be to manage the prepared statements in some persistent layer.

Yasuo pointed out that it's not usually recommended to use the @ operator, but Lukas argued that hiding limitations in third-party libraries causes more problems than it solves - not least since PostgreSQL itself is slated to solve this particular issue in version 8.2. He offered an alternative to multiple suppression; prepare the statement again before entering the loop. This would mean only having to silence pg_prepare() once, although it would of course add some overhead.

Wez intervened to say that it was a bad idea to commit such a large change in behaviour into the stable branch partway through the release process anyway, regardless of the soundness of the idea, and again asked Yasuo to revert his patch. Yasuo took his point and reverted it in the PHP_5_1 branch immediately. He continued the discussion, however, saying he'd heard the intention for the future of PostgreSQL execution calls is something like 'execute; if it fails, prepare', with version 8.2 backends having a view for currently registered plans. That meant clients would need to send requests over the network just to discover whether a plan was defined, which he saw as a waste of resources:

if (!pg_execute('plan')) {
    
pg_prepare('plan', sql);
    
pg_execute('plan');
}

not least because pg_prepare() - an expensive function at the best of times - raises an E_WARNING if the plan is already defined. PostgreSQL users would only be able to take advantage of the performance benefits in using prepared statements when working over a persistent connection. Perhaps a good way would be to remove the E_WARNING thrown when pg_prepare() discovers duplicate plans; would this be a satisfactory approach?

Lukas agreed that it would be nice to have something like pg_is_prepared() in place, but reiterated that it would be best to wait until there is a native solution in PostgresSQL and then work with it. It's possible to work around the limitations in userland code; Yasuo's patch should therefore be reverted in all PHP branches.

Yasuo wrote that he hadn't expected to come across such strong opposition to removing an E_WARNING, given that it seemed to him the best way of resolving the issue, but - given that this was this case - went on to revert.

Short version (thanks Jani): Nothing to see here, move along.

TLK: FastCGI and STDIN

A PHP user named Matthew needed some help figuring out how to use FCGI_STDIN with a running PHP script. He explained that he'd written a server in C++, which has until recently been running PHP through a CGI interface. He'd recently implemented a FastCGI interface to replace that, but had found that interactive command line scripts that used to work under CGI were now failing. It seemed that fopen("php://stdin", "r"); isn't the correct way to read command line input under FastCGI?

Also, he wanted to keep the connection open to php-cgi in order to avoid connect/accept calls, if that was possible. He realized it would be possible to bind a second port to pipe requests from PHP streams to the server, but was there a more simple approach?

Wez suggested briefly that Matthew should write a real daemon using stream_socket_server() rather than 'abusing fastcgi/cgi to work in that way'.

Matthew took exception to the term 'abuse', pointing out that both protocols were in fact designed to support this kind of usage. He didn't understand why PHP, alone among programming languages, should have no support for using fcgistdin. The simplest approach he could see toward making a FastCGI client application would be to close STDIN and SDOUT, dup2() to the end of a pipe, grab data there, wrap it in a FastCGI header and print it to the server. STDIN data from the server would enter a second thread or a multiplexing select() core, use the request ID to find the correct STDIN pipe, and send the data there.

Wez, pointing out that 'a simple dup2() is not sufficient', argued that it is always better to tackle a problem "the right way" rather than trying to force something to work in a way that it doesn't. The use of threads to "solve" this non-problem was also a bad idea - and where did Matthew intend to put the dispatcher for multiplexing? He strongly advised Matthew to step back, take a deep breath and look at the problem afresh; 'and remember: KISS'.

Short version: That's KISS as in Keep It Simple Stupid, not marital guidance.

REQ: Arbitrary precision datatype

Andreas Korthaus hijacked an elderly thread (subject: 'Floored!') from the php-general mailing list. In that thread, a PHP beginner had come across the problem of floating point precision for the first time. Rasmus Lerdorf had referred the newbie to the manual page on the subject, explaining in passing that the options are to work entirely in integers or to introduce a little "fuzz factor" when operating on floating point values. Andreas used the thread as a springboard to launch an impassioned plea for something like GMP or BC to be implemented as part of the core; something that could be used transparently (like float), but having arbitrary precision (like java.math.BigDecimal, PostgreSQL's NUMERIC, GNUCash, etc). Would anyone really care if it slowed down floating-point calculation?

Andreas pooh-poohed Rasmus' suggestion (again in the old thread) that computers cannot accurately represent a fraction; the way he saw it, if a child in elementary school can do it, there's no reason a computer shouldn't do it the same way. It should simply be made to emulate the child's reasoning. Arbitrary precision numbers could be stored in a struct, alongside an array of the digits (as integers) and the position of the decimal point. Then two arbitrary precision numbers could be calculated with the same steps the child learned in school. Sure, it'd take up more memory and more CPU cycles but, given the amount of resources used by the average PHP script anyway, he didn't see this as an issue.

He went on to denounce the recommendation of a "fuzz factor" 'in days of CPUs with billions of cycles/second', simply to calculate financial data. In his experience, very few PHP users resort to workarounds like that or to the bcmath/gmp extensions; either their applications work through sheer luck, or else they are able to overlook the errors caused by floating point arithmetic - or unaware of them.

Andreas concluded by writing that '64-bit integer and an "arbitrary precision numbers" datatype are the last major features missing in PHP', and - surprisingly meekly after all that fire and thunder - asked whether the 64-bit integer would make it into PHP 6?

PHP user Leon Matthews wrote to point out that the 64-bit integer already exists in PHP 5, and told a tale of regression test failure where the test had expected an error when dealing with UNIX timestamps post-2038. With 64-bit support, there is no error; '64-bit timestamps should keep track of time nicely until sometime after the heat death of the universe....' he ended happily. But as Andreas pointed out, there's a difference between having a generic integer type that will support 64-bit processing and a 64-bit integer type that is guaranteed to always be 64-bit regardless of the platform.

Short version: Arbitrary precision is a frequent topic. A detailed analysis of the problem is here; some attempts at resolving it are here and here.

TLK: Late static binding

Dmitry Stogov finally found time to look into Mike Lively's late static binding patch and make it work in all the cases he could think of. He returned the improved patch and test cases - unusually, in a format that the internals list attachment stripper can handle (tar.gz), so we could all see it. Dmitry added bluntly that he still didn't like the name static, and wasn't convinced that the concept was needed in PHP at all.

Jochem Maas wrote somewhat acerbically that the new Zend Framework needs it if the team there want to implement something like

Person::findAll($myFilter)

and 'every half-assed PHPer doing OO in PHP 5' would love to know how they intended implementing it otherwise! If there is a clean way of doing this without introducing static late binding, he has been unable to find it. Current solutions tend to be something like

$peeps = Person::findAll('Person', $myFilter)

or

$p = new Person; $peeps = $p->findAll($myFilter);

and, wrote, Jochem, that last example feels to him like 'some OO principles are being thoroughly raped'.

Mike Lively went through Dmitry's changes; his only query was over both executor_globals and execute_data being used to store the caller_scope pointer. Dmitry explained that EX(caller_scope) is a temporary value set in ZEND_INIT_METHOD_CALL and then copied into EG(caller_scope) during ZEND_DO_FCALL_BY_NAME. Something like

Foo::bar(test());

would cause the method call to occur several times before the first DO_FCALL; to handle this situation, EX(caller_scope) is stored into a special stack.

OO fan Marcus Börger pleaded for the patch to be committed as it is; the keyword could always be changed at a later date, and he needed late static binding for SPL_Types. Andi Gutmans was more cautious; he still had some questions about the patch for Dmitry before he'd be happy to apply it. He also reiterated that this:: was a better keyword than static::, pointing out that this:: had received fairly widespread support.

Dmitry wrote to Marcus explaining that the reason he wasn't happy about committing the patch as it stood was that 'this seldom-used feature' will slow down each PHP call. He intended to measure the performance loss over the coming week, and invited Marcus to do the same.

Short version: It's problematic.

NEW: PHP 5.1.3 RC2

Ilia Alshanetsky, as Release Master for the PHP 5.1 series, announced the second Release Candidate for PHP 5.1.3. as follows:


PHP 5.1.3RC2 has just been released, about a week late, but better
late then never ;-). Please test this RC as much as possible, if it
proves to be stable, this release will be published as final next
week Thursday. The source packages can be found here:
http://downloads.php.net/ilia/php-5.1.3RC2.tar.bz2
MD5: 8ad7bddc9a3b4dbcd2ecb1d6f5446970

http://downloads.php.net/ilia/php-5.1.3RC2.tar.gz
MD5: 1e66780413580bc4a0742fa302735c99

Win32 binaries will be available for download shortly.

Edin Kadribasic, as ever, wasn't far behind him with the Windows binaries:


http://downloads.php.net/edink/php-5.1.3RC2-Win32.zip
http://downloads.php.net/edink/pecl-5.1.3RC2-Win32.zip

Ron Korving noticed that two of the minor code cleanups he'd suggested during the optimization discussions, hadn't actually been addressed. Ilia thanked him for pointing them out and applied fixes in CVS; but as Marcus wrote, 'if only these two spots were all problems we had. :-)'.

Short version: Download, test, report bugs to the usual place.

TLK: GD development @ php.net

The original author of the GD graphics library, Thomas Boutell, posted a lengthy missive to the PHP internals list 'offering the bazaar-keepers the keys to the cathedral'. He wrote that - due to lack of time - he hadn't released a new GD update in some time, and the project is effectively forked at present because many of the fixes and improvements only exist in the PHP version. It made sense to him to move GD development to php.net.

Thomas would still like to maintain the project home page, and he would also hope to contribute to GD development as an individual developer; but he wanted to relinquish control. He asked whether the PHP community were interested in taking it on, and raised issues over licensing, support for GD usage outside PHP, documentation for the C API and patch management.

Rasmus immediately confirmed that php.net are very much interested in the GD project, and that there is the infrastructure in place to handle the move. He saw no problem with either the existing licensing or with providing support for the C API - given that PHP uses it - beyond abstracting the PHP-specific 'hacks to make GD play nice with the memory manager' so that any kind of memory manager override could be allowed. GD would live in its own top-level repository in cvs.php.net, and there are ACLs on php.net CVS access - it would be straightforward to add GD-only accounts for developers wanting to work on the project. The only problem he could see was that there needed to be a volunteer project lead; he himself didn't have the time to do the job either.

Pierre-Alain Joye, writing as the lead maintainer of the GD library embedded in PHP, was very much interested. He backed Rasmus' points regarding the licensing, the C API support and php.net's infrastructure, and was interested to know how the current documentation for the GD C API is maintained. The patches sitting in Thomas' inbox could be forwarded to him; he'd add them as soon as the issue tracker was up and running. Obviously, added Pierre, he was volunteering to lead the project if there was the need. He hoped, however, to get other GD maintainers and users involved. He ended by thanking Thomas for his decision in this matter.

Thomas later confirmed his official agreement both with the move and with the choice of Pierre as GD maintainer. He added that he will make an announcement on the project home page directing users to php.net at the point of the next release.

Short version: Outside recognition for years of hard work.

BUG: __set/ __get behaviour

Jochem wrote again. He'd found that the following piece of code:

<?php

class T {
    private
$array = array();

    public function
__get( $key ) {
        echo
"Getting $key ";
        return
$this->array[$key];
    }

    public function
__set( $key, $value ) {
        echo
"Setting $key ";
        
$this->array[$key] = $value;
    }
}

$t = new T;$t->insideArray = array(); // SET 1
$t->insideArray["test"] = "testing!"; // SET 2
var_dump( $t );

?>

gave him

Setting insideArray
Getting insideArray
object(T)#1 (1) {
["array:private"]=>
  array(1) {
["insideArray"]=>
  array(1) {
["test"]=>
  string(8) "testing!"
 }
}

under both PHP 5.0.4 and PHP 5.1.0. He had expected either the line commented with SET 2 to trigger a failed call to __set(), or the key test to be set in the array returned by __get() but not in $this->array['insideArray']. Shouldn't __set() protect the elements that already exist in an object?

Following criticism from other PHP users (not the development team) regarding user level questions being asked on the internals list, Jochem went on to say that a variation of this code currently doing the rounds on the php-general list actually segfaults under PHP 5.0.4. Under PHP 5.1.1 it throws a fatal error:

FATAL: emalloc(): Unable to allocate 1916888421 bytes

Surely this was an internals issue? He complained that it was all too easy to make PHP segfault when using __get(), __set() and __call(), and he'd come to believe this was a problem in the Zend Engine.

Wez pointedly remarked that it would be more useful to file some solid bug reports than to bitch about the problem, either on the internals list or anywhere else.

Short version: That mystery URL is http://bugs.php.net. Got it?

REQ: Zend API bump

Pierre had some comments to make regarding a Zend Engine change committed to the 5_1 branch by Antony Dovgal and affecting several of the newer extensions. The patch fixed bug #36898 (__set() seems to leak memory when extending internal classes) by adding new functions to initialize and destroy zend_object structs. Pierre agreed that it was an important fix, but queried the wisdom of adding two new functions to the Zend API during the Release Candidate phase for PHP 5.1.3. He also pointed out that it is no longer possible to compile the PECL extensions Tony had altered to make use of those functions, against the current release of PHP 5.1. Wouldn't a Zend API bump be in order?

Wez pointed out that extensions not using the new API would in fact continue to work; they would simply continue to leak when __set() is used with them. However, he agreed that the Zend API version number should be bumped, allowing extensions to make use of the new API. Pierre went further, saying that the Zend API number should be bumped every time something new is added; 'it is getting really hard to know when and what was introduced'.

Short version: The Zend API number's still 20050922 at the time of writing...

TLK: PHP-GTK corner

Andrei Zmievski had one of those 'aha!' moments at the beginning of the week, and added the gtype object property into CVS. That means that

$obj->gtype ;

will return the object's type - which is useful, because we don't have $obj->get_type() exposed, so $obj->get_name() has been the only way to reach that information until now.

Anant Narayanan nudged Andrei about his waiting patches, and Andrei subsequently added GtkAboutDialog into CVS (but not GtkPlug or GtkSocket).

Madeleine Drake wrote in with an unusual request; she wanted a Windows binary of PHP-GTK 1 compiled against PHP 4.4.2. She explained that she hoped to file a bug report about an exit hang under Windows 98, and felt that php.net wouldn't talk to her unless she'd tested with the 'latest and greatest' version of PHP. I said I'd make her one if she hassled me, but suspected her bug report would get short shrift anyway given that the issue only arises with the Win9x/PHP 4/PHP-GTK combination.

Christian Weiske came up with an idea for a new method, GtkWidget::set_visible(). He wrote that he needed to call show()/hide() on menu items dynamically in response to a GtkListStore value, and although it was perfectly possible to do this with an if block, it was an ugly approach. Since Christian felt that this routine was likely to arise frequently in PHP-GTK programming, it would be nice to be able to call it in a single line. Scott Mattocks agreed, and wrote a patch implementing the suggestion but with a second optional boolean parameter to determine whether show()/hide() or show_all()/hide_all() should be called. He added that his patch didn't actually work, and he had no idea why; it compiled, the method was callable, the values passed to it were correct, it just didn't toggle show()/hide(). Andrei wrote that he was wary of adding PHP-specific methods to GTK+ widgets, but did it anyway once he'd tracked down the reason Scott's otherwise perfect patch failed. (Scott had declared his variables as the integer type gboolean rather than as the unsigned char type zend_bool, so the result was always TRUE.)

Finally, Andrei looked into the segfault Christian and Anant had both reported in GdkDrawable::draw_rgb_image_dithalign(). He ended by throwing out the *_dithalign() methods completely, and writing new wrappers for draw_rbg_image() and draw_rgb_32_image(). Christian wondered if this meant GdkPixbuf animation should work now, but Andrei was unsure whether it should; he only knew it hadn't worked on his box.

Short version: Getting closer all the time.

CVS: Mostly Unicode

Changes in CVS that you should probably be aware of include:

  • Bug #36869 (memory leak in output buffering when using chunked output) was fixed in HEAD and 5_1 [Tony]
  • There are several new ext/mbstring functions in CVS HEAD: mb_list_mime_names(), mb_strstr(), mb_strrchr(), mb_stripos() and mb_strripos() [Seiji Masugata]
  • ext/pdo bugs #35671 and PECL #6504 - caused by the fix for #35332 - were fixed in CVS HEAD (only) [Wez]
  • Also in ext/pdo, bug #36342 (ODBC won't let you bind variables by buffer after "long" columns) was fixed in CVS HEAD and 5_1. The maximum length for column names was increased as part of the same patch. [Wez]
  • Zend Engine bugs #36878 (error messages are printed even though an exception has been thrown) and #36897 (debug_print_backtrace() doesn't return void but array(0) {}) were fixed in the HEAD and 5_1 branches [Tony]
  • A build issue and bug #36887 were fixed in PHP_4_4 branch (only) [Tony]
  • Bug #36886 (User filters can leak buckets in some situations) was fixed in PHP_5_1 branch (only) [Ilia]
  • In ext/mysqli, bug #36922 (missing MYSQLI_REPORT_STRICT constant in userspace) was fixed in CVS HEAD and 5_1 [Tony]
  • ext/spl bug #36941 (ArrayIterator does not clone itself) was fixed in CVS HEAD and 5_1 [Marcus]

Derick Rethans queried the new additions to ext/mbstring, pointing out that the full Unicode support in CVS HEAD makes the extension obsolete there. Seiji replied that a) he couldn't add them into PHP_5_1 branch during the release process - but intends to following the 5.1.3 release - and b) not every application currently relying on mbstring functionality will immediately adapt to use Unicode when PHP 6 is released.

Meanwhile in CVS HEAD, Sara Golemon and Andrei both had another busy week. The chief change Andrei made that the development team needs to be aware of is the introduction of the U and S type specifiers in the parameter parsing API. These are intended for use when a function wants to accept only Unicode or binary strings (without type conversion).

Sara beavered away at her streams work, moving Unicode conversion to the filter layer. It is now possible to set encoding on a stream in userspace either by context:

$ctx = stream_context_create (NULL,array('encoding'=>'latin1'));
$fp = fopen('somefile', 'r+t', false, $ctx);

via stream_encoding():

$fp = fopen('somefile', 'r+');
stream_encoding($fp, 'latin1');

or through a filter:

$fp = fopen('somefile', 'r+');
stream_filter_append ($fp, 'unicode.from.latin1', STREAM_FILTER_READ);
stream_filter_append ($fp, 'unicode.to.latin1', STREAM_FILTER_WRITE);

She also made php_stream_passthru() Unicode-friendly, which means that the userspace functions fpassthru() and readfile() are up to date. readfile()'s signature was altered slightly along the way, and the optional boolean parameter use_include_path is now a bitmask "flags" parameter, with the value of FILE_USE_INCLUDE_PATH 'coincidentally' set at a BC-friendly 1. Sara went on to apply the same principle to file_get_contents(), again with that signature change.

Following this, Sara waded deep into streams support, adding API hooks and the new .ini setting unicode.filesystem_encoding - this to cope with Unicode conversions of filename entries. Protocols other than straightforward file:// can override the directive. She ended her week by rewriting a handful of tests for ext/bz2, noting in her commit message that 'compression is just a binary thing... Write unicode and suffer my wrath!'

Short version: Unicode-aware streams support is pretty much there.

PAT: Quiet week

Hannes Magnusson posted a patch [dead link] to make streams respect POSIX error retrieval functions. This would allow code like:

$fp = @fopen("/unwritabledirectory/filename""w");
if (!
$fp) {
    
printf("The errornr was: %d"posix_get_last_error());
}

and would also fix bug #36868, which expresses the perceived need for it.

Wez reviewed the patch, and came back with the news that Hannes was probably attempting to capture the POSIX errno value too late in the procedure. In fact, he wrote, the failure code might not even be an errno; getaddrinfo(), for example, returns a failure code outside the errno "protocol", and many kinds of streams fail at a level that doesn't even allow errno to be set.

Short version: Streams stuff is more complicated than it looks.

Comments