Zend Weekly Summaries Issue #362

      Comments Off on Zend Weekly Summaries Issue #362

TLK: in_class_exists
TLK: Square brackets [continued]
PAT: Reference cycle collector
REQ: with()
TLK: Number problems
REQ: PDO driver for U2
REQ: Overloading by method signature
CVS: Solaris back on the agenda
PAT: PDO attack

7th October – 13th October 2007

TLK: in_class_exists

Greg Beaver’s original post
actually had a much less snappy subject line: “[PATCH] in_class_exists()
for detecting __autoload() called by class_exists() and
interface_exists()
“. If you’re having a sense of déjà vu
here, it’ll be because the theorizing on the internals list was last week; most
of the patch discussion came later.

Greg explained that the function is intended for use in
__autoload() declarations, and posted a demonstration scriptlet:


<?php

var_dump(in_class_exists('test'));

function __autoload($class) {
    
var_dump(in_class_exists($class));
    eval (
'class '
. $class . ' {}');
}

$a = new Bungalow; /* autoload
called, not by class_exists(), bool(false) */
class_exists('test'); /* autoload called by class_exists(), bool(true) */
class_exists('test'); /* class exists, no autoload, no output */
$a = new test; /* autoload not
called, no output */
$a = new Another; /* autoload called, not by class_exists(), bool(false) */

?>


The idea was to give autoload handlers the ability to detect when it’s safe
to terminate execution, for example by calling die(). Greg cited
differences in user expectations to back his argument: someone calling
class_exists() would expect a return value on
__autoload() failure, whereas someone instantiating an object
would expect a fatal error on __autoload() failure.

Greg also explained why he wanted the change; in PEAR2, he hoped to display
helpful information when class files were missing. At present, missing files
result in a less than helpful fatal error when __autoload() is
invoked as a result of a call to class_exists().

Stas Malyshev agreed with Greg’s point about user expectations, but felt it
was down to the Zend Engine to resolve the issue; it would be better to
prevent the autoloader ever producing errors. If the Engine really needed a
class that didn’t exist, it would bail out anyway. Stas didn’t see any
problem with having an autoloader print debug information if that was what
the developer wanted, but he also didn’t see why that facility should belong
uniquely to class_exists().

Greg pointed out that someone calling class_exists() wouldn’t
actually want extra output if the class was missing. He came up with
another possible approach; perhaps context information could be added to the
error string displayed by the Engine? Stas vetoed that swiftly; autoloaders
can be chained. It would be better to log it somewhere than create another
mini logging system. This was also why the class_exists()
argument didn’t hold water; the next autoloader in the chain may well find
it. Besides, debug features really should be handled at application level…

François Laupretre intervened to give Greg some much-needed support.
If the requested class has a prefix, such as ‘PEAR2’, by convention that name
space is reserved; it cannot be resolved by another handler. It was
therefore wholly legitimate to want the PEAR2 autoload handler to display a
useful and appropriate message. That said, he didn’t go along with Greg’s
approach to the problem; rather than attempting to enrich the interpreter’s
message, he’d have the handler trigger a warning. This would at least ensure
that the end user received useful information prior to the fatal error.

Stas argued that, in such a case, the autoloader should be able to prevent
the Engine trying another handler. Alexey Zakhlestin wasn’t sure that was a
good idea. It might be better to allow different autoload functions to be
registered on a per-namespace basis. Warming to his theme, Alexey added that
this could be done by extending the spl_autoload_*() function
family and leaving autoload_register() at its current level.
Stas wondered why it had to be per-namespace? Besides, Alexey’s was a complex
solution to a simple problem; the autoloader merely needed the ability to
declare that it was the only possible loader for a given class, and give the
reason for failure as appropriate. Marcus agreed with Stas; it should be kept
simple. The idea of tagging a class as ‘required’ or ‘optional’ might be
useful here. Stas doubted it; why would anyone try to load a class they
didn’t need? Greg mentioned the words ‘optional dependency‘ at this
juncture. Stas denied all knowledge of such a thing, given the context.

Short version: The answer’s still no.

TLK: Square brackets [continued]

One Emil Hernvall didn’t care that half the core development team dismissed
the idea of supporting substr()/array_slice()
functionality in square brackets as ‘too Perl-ishlast week. In
his view, it would be incredibly useful when handling regular
two-dimensional database result sets. He’d like to be able to do:


$result =
$pdo->fetchAll(PDO::FETCH_ASSOC);
$result =
array_combine(result[]['my_id'],
$result);


If Emil were to prepare a patch that made this possible, would there be any
objections to its inclusion in the PHP core?

Rasmus Lerdorf promptly objected, on the grounds that ‘I have no idea what
that syntax should do by looking at it
‘. Since the [] array
syntax has always been a write-only auto-incrementing construct, and since it
makes no sense to auto-increment on a read, the proposal equates to
overloading that syntax to mean something completely different. Tony Dovgal
wasn’t overly enthusiastic either, and gave it the thumbs down.

Andrei Zmievski meanwhile caught up with his mail backlog and started firing
off emails to all and sundry arguing for the right to be Perl-ish. He didn’t
see that


$foo =
$bar[:5] . "-" . $bar[5:];


was any more cryptic than


$foo =
substr($bar, 0, 5) . "-" . substr($bar, 5);


and besides, ‘if we were concerned about duplicate functionality, we
probably wouldn’t have SimpleXML’
. Tony, who reckoned that
$bar[:5] looks like a typo for $bar[':5'], wrote
that old line about “just because there have been mistakes made in the
past…” Naturally enough, Alexey challenged him on that point, but managed
to write an old line of his own about “it’s easier to learn the simple new
operator than the existing functions and operators that do the same task”.
Tony pointed out that the old and the new would co-exist and every PHP
developer out there not coding in a bubble would need to learn both.

A few of the bubble people continued to argue that the operator syntax is so
much cleaner. Scott MacVicar pointed out the difference between an operator
that has existed in a language since inception and one introduced into a
language ten years or so down the line. Most people would end up using the
older method to support back compatibility, and those that didn’t would
simply cause confusion. Stas mentioned a language whose name he didn’t want
to mention ‘on a family-oriented mailing list‘, which has the same
syntax mean two different things, observing that this didn’t qualify as
“easy” in his book. He’d be ‘kind of OK‘ with the idea of making the
substring operator {X:Y} (did he mean [X:Y]?), but
didn’t see how that was an improvement over calling
array_splice(). Besides, it simply wouldn’t make sense in many
array contexts; it wasn’t a natural extension of the existing square brackets
syntax.

Short version: Only Andrei (and the bubble people) love Perl.

PAT: Reference cycle collector

David Wang finally got his long-awaited CVS account, and promptly committed
his macros for manipulating refcount and is_ref into CVS HEAD and the PHP_5_3
branch. (Farewell PHP_5_2 compatibility.) This being done, he posted his
garbage collection patches for the same two branches to the
internals list for review. The complete Zend Engine files containing the
guts of his GC implementation were posted separately.

With the mechanics out of the way, David announced that the garbage collector
he was now presenting was not the one interested parties may have been
following in his SVN repository. This version came from an experimental
branch, and he believed it to be more tolerant of ‘zval juggling‘.
Finally, David recommended an additional patch (again for both the PHP development
branches)
that would force the compiler to always inline functions marked as inline.
He’d found that compilers tend to ignore the inline keyword when
faced with large object files in the Zend Engine, making PHP run much slower
than it should when his garbage collector produced such files.

Cristian Rodriguez was overjoyed, and promised to test the patches as soon as
he was able. Andi Gutmans played it a little cooler, promising to test the
patches in the Zend labs immediately after the upcoming Zend/PHP conference.

PHP user Tony Bibbs wanted to know if the patch would fix bugs #33595 (Recursive references leak
memory) and #33487 (Memory allocated
for objects created in object methods is not released). Alexey advised him
that it would fix the former, but not the issue of PHP’s inability to free
unused memory during script execution, and pointed out that this wasn’t
necessarily a bug in the first place. David wrote that he’d created a
userland function to manually free the memory occupied by garbage cycles,
gc_collect_cycles(), which would resolve that issue – assuming
the behaviour is still problematic once his patches have been applied. He
didn’t believe it would be.

Short version: It’s a bit like watching brain surgery in action.
Terrifying.

REQ: with()

There must be something about the name Sebastian. One of them (not Bergmann)
wrote to internals@ suggesting that it might be a really neat idea to
introduce JavaScript’s with() function into PHP. Basically it
allows you to call several class methods but only mention the class name
once, so instead of:


$class=new
class;
$class->do_something();
$class->do_more();
$class->do();


you’d write:


$class=new
class;

with($class) {
    
do_something();
    
do_more();
    do();
}


Self-confessed lurker Richard Black wrote that this feature exists in Delphi
too, and has been a source of confusion to him: ‘the separation of object
and method leaves you unsure of what is actually being called’
. To
illustrate the problem, Richard asked which do() is being called
here? (Note: you’re only allowed a quick glance).


function
do() {
    echo
"hello";
}

$class=new class;

with($class) {
    
do_something();
    
do_more();
    do();
}


He added that it gets harder if there’s the option to nest
with() constructs.

Hartmut Holzgraefe was surprisingly positive about the proposal, initially at
least. He felt that with() could made sense as a convenience tool
if used correctly‘, but agreed with Richard that it could create ‘a
maintenance nightmare
‘ otherwise. However, he went on to wax lyrical about
the clarity of PHP scoping, as compared to scoping in other languages, and
concluded that it might be a bad idea to weaken that concept.

Someone calling themselves “BDKR” came out of the woodwork to back Hartmut’s
points about the explicit nature of PHP.

OO fan Ralph Schindler wanted to know how with() wins over
fluent interfaces, which are already possible in PHP… allowing code like


$bar->doSomething()->doMore()->do();


but that, as Stut pointed out, is a chain call that will only work if the
methods in $bar return $this.

Stut also came up with the interesting observation that just because
something wasn’t obvious didn’t mean it shouldn’t be implemented. Besides, it
seemed obvious enough to him, with or without nesting. He saw great
value in not having to repeat references to the same object over and over
again throughout a script. Rasmus wrote that something being less than
obvious was one of the strongest reasons not to implement it, in his view. It
would simply obfuscate the language. Such considerations apart, from an
internals perspective a feature like with() would necessarily be
much slower than the current straightforward hash lookup. It would mean
walking up the tree of objects to check whether the method exists, and
assuming a straight function call if it doesn’t – unless there happened to be
a __call() method, and then what? Stut admitted that he didn’t
know enough about PHP internals to make an informed judgement about the
impact on performance; he also hadn’t considered __call().

Lars Gunther had evidently been making preparations for a longer battle over
this, and produced several
authoritative
references
to the evils of with() in JavaScript to kill the topic dead.

Short version: without()

TLK: Number problems

XML guru Rob Richards posted a query on the list. Why are objects in PHP
always converted to long when performing arithmetic operations?

Rob had been looking into SimpleXML bug #42780 (Invalid type
casting?), which complains that the only way to retain precision is to
explicitly cast the object to float or string. This
didn’t seem right to him, and he’d been mulling over the possibility of having
an object handler that could be used by the object to determine a numeric
return type – something like to_numeric(). Such a handler would
allow object arithmetic to work in the same way that arithmetic currently
works with is_numeric strings, since it would offer a means of
“proper” conversion.

Nobody picked up on Rob’s post, probably because those likeliest to be able
to respond were tied up with the Zend conference at the time.

Someone named Maurice didn’t get a response either with his ‘is this a
bug?
‘ post about integer overflows. He’d found that PHP 5 uses 64-bit
integers when built on 64-bit systems (good). However, on a 32-bit system,
PHP 5 wasn’t always using a 32-bit integer (bad). Integers higher than
32-bit, such as -17441010873, should be “overflowed” to -261141689 on a
32-bit system, but are actually mapped to INT_MIN, which
translates to -2147483648, under Linux. On some systems, PHP 4 is broken in
the same way.

Maurice had also tested for this behaviour on Windows XP, and found
everything working correctly there. He helpfully included his test script so
that others could check the results on their own set-up:


<?php

$a = -17441010873;
$b =
(int)-
17441010873;

echo "expected on 32-bit systems:\n";
echo
"float(-17441010873)\n";
echo
"int(-261141689)\n";
echo
"\n";

echo
"expected on 64-bit
systems:\n"
;
echo
"int(-17441010873)\n";
echo
"int(-17441010873)\n";
echo
"\n";

echo "result:\n";

var_dump($a);
var_dump($b);

?>


Short version: You can’t know how long I’ve waited to see a Linux-only bug…

REQ: PDO driver for U2

One Claude Masseron of IBM wrote to ask about the steps needed to get a
PDO driver
for the IBM U2 databases into the PHP core distribution. Once
again, this was the wrong week to ask… and possibly a little premature,
since there is no PDO_U2 driver in PECL. If you want to get hold of the
source for the driver, the only way to do so at present is through an
article about the extension
on IBM developerWorks.

Short version: The pecl-dev list is
there for exactly this kind of question.

REQ: Overloading by method signature

Hans Moog hoped to see overloading by method signature supported in PHP; he
believed it would ‘help developers to write better code‘. That would
be code like:


class
Test extends TestClass {

    public
string function test(integer
$int
) {
        return
"Hi";
    }

    public
integer function test(string
$string
, integer $int) {
        return
$int;
    }
}


Alexey immediately replied that ‘such overloading would be incompatible
with PHP’s dynamic nature
‘ and further, that even type hinting for basic
types had been rejected in the past. Hans wasn’t sure about this ‘PHP way’
business, at all, and wrote that he could easily provide a patch to support
the code in his example. In fact, it seems he’s already using it at work.

Alexey wanted to know how Hans’ patch would handle the situation where too
many arguments were passed to an overloaded method, but Hans explained that
it would simply throw an error. However, the patch also offered a way to
describe a method that would accept any number and any type of arguments:


public
function
fun1 ($parameters ...) {}
var_dump($parameters);


where the “...” notation makes it possible to have any number of
arguments of the given type, and a missing type hint defaults to “mixed”. You
could even combine ... with typed parameters to create a
function signature like:


public
function
fun1(string $firstParam, string
$remainingParameters
...) {}


varargs, basically.

Short version: You might as well just call it “C” and have done with it.

CVS: Solaris back on the agenda

Changes in CVS that you should probably be aware of include:

  • The PCRE extension in PHP_5_3 branch now relies on
    pcrelib version 7.4 for its functionality. The extension also
    got two bug fixes: bugs #37911
    (preg_replace_callback() ignores named groups) and #42737 (preg_split('//u')
    triggers an E_NOTICE with newlines) [Nuno]
  • A fix for bug #41822
    (Relative includes broken when getcwd() fails) means that
    Solaris should be better able to cope in all three current PHP branches
    [Robert Thompson, Jani]
  • Another Solaris fix, this time closing safe_mode bug #41899 (Can’t open files with
    leading relative path of ‘..’ and ‘..’ is not readable) in 5_2 and 5_3
    [Robert Thompson]
  • In ext/mysql, bug
    #42890
    (Constant LIST defined by mysqlclient
    and c-client) was fixed in the PHP_5_2 and PHP_5_3 branches
    [Andrey Hristov]
  • The PDO_ODBC driver now has iODBC support in the PHP_5_3 branch [Wez
    Furlong]
  • In ext/mysqli, the 5_2 branch now has the
    MYSQLI_SET_CHARSET_NAME option that came into the rest of the
    branches along with mysqlnd support [Scott MacVicar]
  • Session bug #42869 (Automatic
    session id insertion adds sessions id to
    non-local forms) was fixed in 5_2, 5_3 and CVS HEAD [Ilia]
  • The debug_info helper in overloaded objects was
    backported to the PHP_5_3 branch [Marcus]
  • In PDO, bug #42917
    (PDO::FETCH_KEY_PAIR doesn’t work with
    setFetchMode) was fixed in the PHP_5_2 and PHP_5_3 branches
    [Ilia]
  • In PHP_5_3 and CVS HEAD, icon file format support
    (image/vnd.microsoft.icon, .ico) was added to the core
    function getimagesize() [Scott MacVicar]

Robert ‘Solaris’ Thompson, having only just got his CVS karma, didn’t
immediately understand that he was still meant to post his patches for review
prior to committing them. He also hadn’t found the README.CVS-RULES or
CODING_STANDARDS files. Tony and Jani subsequently joined forces to
educate him. You have to feel for that poor guy…

Short version: Some surprising backports went into CVS this week.

PAT: PDO attack

PEAR Group member Martin Jansen suddenly started bombarding the internals
mailing list with patches.

Martin started small, removing a stray backslash in the ext/mysql
source. Tony promptly applied that one. Martin’s next outing was slightly
more ambitious; it unified phpinfo() output for the PDO drivers
so that they all include the PECL module version string and the CVS
identifier. Stas thought this a good idea, but didn’t have time to check the
full patch when it came. Marcus did, eventually, and wrote that the
$Revision: $ keyword would be enough. He also noted that
whitespace was broken throughout the patch.

In the meantime Martin had posted another patch, this for the PDO_MYSQL and
PDO_SQLITE drivers. The patch was in response to feature request #42589 (getColumnMeta()
should also return table name), but Martin admitted that he’d had problems
getting it to work with SQLite, and his solution wouldn’t work at all outside
the bundled library. He later discovered that the relatively straightforward
MySQL driver already had the feature in its PECL version anyway.

Finally, Martin offered up a trivial fix for bug #42322 (PDO_MYSQL
fetch() throws exception on INSERT statements)
against the PHP_5_3 branch, noting that he’d only just learned there’s no
point in patching PDO in CVS HEAD.

Short version: The silence was deafening – but then, there was
nobody home.