Zend Weekly Summaries Issue #300

      Comments Off on Zend Weekly Summaries Issue #300

TLK: zend_u_strtod
TLK: Windows installer [continued]
RFC: OO inheritance strictness revisited [cont]
REQ: zend_string_to_number
NEW: PHP 4.4.3
TLK: Method inheritance and parent classes
CVS: 4_4 branch now open for fixes
PAT: Long-awaited line directive [continued]

TLK: zend_u_strtod

Working his way through the quagmire that is large doubles in base conversion
despite everyone else’s complete lack of interest, Matt Wilmas noticed that it was
much slower to convert Unicode strings to double than to convert standard binary
strings. In fact, it was faster to convert a Unicode string to a binary string and
use the regular zend_strtod(). He’d pondered on this for a while before
attempting to modify zend_u_strtod() to do just that where relevant,
and now presented a patch containing the result. In Matt’s tests,
zend_u_strtod() was now 8 times faster on average – and nearly 20 times
faster when fed a non-numeric string. Was the patch usable?

Andrei Zmievski tested it, and found that the numeric Unicode string to double
conversion was only twice as slow as the binary equivalent – a huge improvement on
the current situation. If this was acceptable to the team then yes, the patch should
be committed. If not, zend_strtod() should be ported to deal with
Unicode strings directly.

Matt, wondering why the code behind zend_strtod() is so complicated,
pointed out that zend_string_to_double() has most of the same
functionality and is relatively straightforward. Wouldn’t it make sense to implement
zend_u_strtod() to use the methods zend_string_to_double()
uses? Andrei referred him to Derick Rethans, who contributed that code, asserting
(tongue firmly in cheek) that zend_string_to_double()is not
actually used anywhere
‘. Derick confessed to having ‘stolen’ it from BSD, and
pointed out that the function is actually needed on some architectures.
Andrei hoped Derick would port it to support UChar* himself, but no
response was forthcoming. He ended up rewriting the long, double and resource to
Unicode conversions to use custom functions rather than u_sprintf(),
noting in his commit message that long-to-unicode conversion is now only 8% slower
than long-to-binary, and double-to-unicode is actually 6% faster.

Short version: The man went to pick up a penny and found a
pound.

TLK: Windows installer [continued]

John Mertic, still hard at work on the new installer, announced that he’d made a
few more tweaks. There is now a dedicated dialog for choosing the install directory;
the Apache configuration contains some debugging; and the PHPIniDir
directive, now supported in Apache 1.3.* by PHP 5.2, is always set in Apache module
installs. The latest version is available here, thanks to Edin Kadribasic.

Richard Lynch, reading the thread history, wrote in to say that given his
experience with Windows users on the PHP general mailing list, anything John could
manage to compile and test should be downloaded as part of a full install. Many
users were simply unable to track down the DLL files they needed. On the other hand,
a ‘lite’ user was bound to need a DLL that isn’t in ‘lite’ at some point, leading to
similar issues. Richard suggested that the ‘lite’ install should contain nothing
beyond php.exe, php5ts.dll and the SAPI modules, and clear,
version-specific instructions about locating, downloading and installing anything
else the user might need.

Wez Furlong dismissed the idea of a ‘lite’ installer, pointing out that anyone
needing an installer in the first place would be likely to want
everything.

Off-list, Christopher Jones had joined the small group of testers and debuggers
focusing on John’s work at that point. He suggested that John use native tools
during the install and configuration, rather than rely on a copy of PHP that was
mid-install itself. Chris also suggested logging the entire installation process to
install.log, regardless of whether there had been any errors: ‘It is so much
easier to help people when you can say “check the log file for problems”
‘. Then
we were back to the issues arising from the Apache configuration’s need for a
terminating slash on some (but not all) paths and in some (but not all) situations,
and my, how the test group’s inboxes overfloweth.

Short version: It’s looking really nice.

RFC: OO inheritance strictness revisited [cont]

Having returned from his holidays, Richard Lynch got back into form by taking up
the ages-old OO inheritance
strictness thread
. He didn’t get very far – about as far as Lukas Smith’s first
response – before bursting out with his opinion that the question was about the
direction PHP development should take. Java? Or something with a touch more Common
Lisp and Scheme in it, to create something new? Would there be polymorphism in PHP
7? What would it look like? For himself, he’d rather not have PHP force the same
arguments on same-named subclass methods; what exactly was the gain from that?
Richard saw none, but ‘obviously somebody somewhere thinks it’s a Good Idea for
some reason
‘. Clarification would be good.

Lukas replied cautiously that this had been covered already in the thread, but
proceeded to give an overview of the OOP concept of child objects anyway. He
mentioned in passing that he’d like to revive the call for an optional strict mode,
allowing PHP to break the rules in the time-honoured manner by default.

Daniel Convissor came in to agree with a point Pierre-Alain Joye had made much,
much earlier about the role of interfaces in the academically pure OOP approach. Why
should anything more be necessary?

Marcus Börger grumbled that nobody was being forced to use OO, but if you wanted
to use it you couldn’t then ignore its basic rules – ‘it is as it is‘. Where
should the line be drawn? Why should the PHP development team be charged with
reinventing OOP theory and then making it work? Besides, doing so would make the
language overly complicated, as well as breaking the entire history of back
compatibility. Christian Schneider took issue with this, pointing out that PHP in
fact could and did allow method signatures to be altered before, and he still had
heard no convincing reason for the change. After all, nobody was being forced to
change the signatures either. Why couldn’t it be accepted that different coders
simply use classes in different ways? He ended with a battle cry:

When the hallelujah chorus had died down and Pierre had had a fresh
<rant />, Rasmus Lerdorf shared a few insights regarding the
changing demographics of PHP usage. Expectations had changed; ‘We have kids
coming out of universities today who barely know what procedural programming
is
‘. Since all those kids know is OOP, in order to meet their expectations PHP
needs not to ignore too many OOP rules. It might be fine to loosen up some of those
rules where it makes sense to do so, but ignoring them completely was not a valid
approach. Christian suggested providing a superset of the expected behaviours and
pointed out that ‘they can still use E_NOTICE for bondage mode‘.

Robert Cummings, responding to Marcus, wrote that PHP might be as it is, but it
is certainly not as it was. PHP 4, he wrote, allowed signature mismatching. Further,
the reasons for reinventing OOP theory were obvious to him: ‘following all the
sheep out there just makes for more fodder
‘, and innovation almost always comes
from changing the game, rather than from following the pack. Unfortunately, Marcus
took this as a personal attack. He hurled back a furious response claiming that PHP
5 currently has a better iterator implementation than any other language, and
suggested that Robert ought to be helping the development team rather than arguing
with its members. Robert hurriedly apologized on both counts and said a lot of very
very nice things about both PHP and its developers.

Mike Wallner, also responding to Marcus, wrote an impassioned plea for
freedom:

C++ fan Soenke Ruempler woke up to the fact of the discussion having
come back to life, and started talking about parameter polymorphism again, just as
Derick got into telling Robert there had never been OO support in PHP 4.

Lukas pointed out tersely to Rasmus that PHP has a responsibility to its existing
user base as well as the newcomers. Derick felt that teaching those users how OO
should work is being responsible, and said something else derogatory about OO
support in PHP 4. Lukas argued that ‘proper OO’ is an entirely different coding
discipline; it requires a planned inheritance structure and a parent interface with
the simplest possible signature. It may not be feasible to change the parent’s
signature if the code base expands. This leads to developers adding any new methods
they might need to the child class, purely to avoid breaking the
instanceof relationship. The end result of this approach is several
unrelated methods residing alongside several related methods that are rarely, if
ever, used – and for most of the used methods, the instanceof
relationship does not enter into the equation. To Lukas, PHP as a rapid glue
language should stress freedom above all. Although those users wanting to use PHP as
a ‘proper OO’ language would benefit from OO strictness, if willing to put in the
extra planning time, alienating the existing userbase by making it impossible to
retain their own approach seemed a bad idea to him. Derick suggested an
‘i-don’t-want-to-care-about-oo-in-my-classes’ mode, which was more or less what
Lukas had been aiming for all along, except that Lukas wanted this to be the default
behaviour. Christian felt this was overkill.

Zeev Suraski was interested, however. He felt it could be even more fine-grained;
it could be on a per-method basis, since that’s the level at which the developer
knows whether overriding methods need to adhere to the parent signature. He made a
brief outline of recommendations:

People working within OO restrictions would be likely to run
E_STRICT clean code, so providing userland support for tagging classes
or methods as strict would probably be unnecessary.

With that out of the way, Zeev found Derick’s statement(s) that PHP 4 didn’t have
OO support and called him out over it: ‘It sure as hell did‘. Derick retorted
that he could hardly call an array with another table with function pointers OO.
Zeev promptly proved that he could (if amending it to ‘cooperative OO’ along the
way), and pointed out that a lot of people out there are in fact using PHP 4 to
implement common OO practices. However, finding a compromise that would give both
camps a workable solution made good sense to him.

Ron Korving was also looking for a compromise. He came up with support for method
overloading, saying that it would be extremely rare for anyone to want to combine
this with the conflicting func_get_args(). Inherited abstract methods
would be the exception to the rule, and could not be overridden. Hartmut Holzgraefe
pointed out that this would require runtime checks on every single method call,
given PHP’s dynamic typing. Richard Quadling thought it could work at compile time
if PHP weren’t dynamically typed; or if type hinting were available for all types,
with E_STRICT being used as a kind of switch for this. Hartmut argued
that this would mean almost a 100% rewrite of PHP; changing the underlying type
concept in an incompatible way is not the same thing as refactoring; and the end
result probably shouldn’t be called PHP anyway.

Zeev meanwhile had caught up with Rasmus’ explanation about the new generation of
PHP programmers, and responded to that:

Rasmus felt there was much to be said for having the choice, and gave
Marcus full credit for that (something Zeev was quick to agree over). He agreed with
Zeev that it should be possible to mark certain internal methods as strict but keep
userspace methods loose. Pierre backed them both, pointing out that many have asked
for this.

Everybody else was still focusing on the recommendations Zeev had sketched out.
Mike felt the proposed E_ERROR situation is already covered by
interfaces. Derick just wanted an example of the new flag. Robert suggested it might
look something like:


<?php

class foo {
    function
bleh($p1, $p2) {
        echo
"Bleh: $p1, $p2
"
;
    }
}

class fee extends foo
{
    
loose
function bleh($p1) {
        
parent::bleh($p1, $this->prop);
    }
}

?>

He decided he rather liked that, but Pierre suggested it should be the other way
around – the default behaviour should be loose, and the flag should be
strict. Zeev confirmed that Pierre’s version was closer to his
intention.

Derick, catching up with his mail again (this thread ran all week and beyond),
wrote in response to the agreement over keeping userspace methods loose. He wanted
to see at least an E_STRICT warning when signatures are violated –
something to aid the ‘strictness’ fans along, but not get in the way of ‘the loose
people’ (Zeev’s term). He believed Zeev had in fact suggested something like this.
Zeev agreed, clarifying things for Pierre, who had assumed there would be no change
at all for loose code. Although unable to prove it, Zeev’s instinct was that there
is probably ‘a very strong mapping‘ between those who care about strict
coding and those who have E_STRICT enabled. In fact it probably wasn’t
essential to add any new modifiers, loose or strict…

Pierre pointed out that E_STRICT currently catches many issues that
are not OO-related. Furthermore, like many developers, he himself tends to develop
under the most verbose error reporting available, in the hope of preventing future
technical problems. He wouldn’t consider this particular issue a technical problem –
but having such annoying E_STRICT warnings would prevent his using the
E_STRICT error level.

Lukas made a good point:

Come to that, he’d like to see a clear, written policy regarding
E_STRICT warnings and the way they will be handled in future releases.
Regards the subject in hand, he still preferred the idea of having a flag on a
per-class basis, but indicated that this change would also be acceptable to
him.

Derick didn’t think the per-class flag was a good idea at all, pointing out that
someone somewhere would need to modify their source files if the code needed to run
on older versions of PHP. Pierre didn’t buy that argument, since existing code would
be supported in the default mode. If someone wanted to write strict code, they
should require PHP 5.2+ for their applications. Christian pointed out that the same
objection is equally valid for PPP (visibility). That said, he would also prefer to
utilize E_STRICT than bring a new modifier into the language, but
shared Lukas’ concerns over E_STRICT. Whichever way it went, he’d
prefer to not test static methods and to allow adding default values to methods, so
that the object of an inherited class could accept the same parameters as the base
class. In fact, he attached a patch to
implement this.

Richard Quadling wondered whether having parameter overloading dependent on the
E_STRICT setting would mean that developers needed to toggle
E_STRICT off and on at runtime? Derick replied that it would only be an
issue for the opening script, not for included files:


<?php

error_reporting(0);
include
'included.php';

?>

Richard, who hadn’t thought of that, realized he could also use
__autoload() to distinguish between strict and non-strict classes and
set the error_reporting level prior to the include() call.
Derick agreed that this was also an option.

Jochem Maas meanwhile had picked up on Lukas’ call for E_DEPRECATED.
He wrote that he’d tried to point this out before; as a ‘loose’ coder, he uses
E_STRICT purely because it warns about deprecated code. Having
E_DEPRECATED as a separate error level would prevent his ever having to
look at E_STRICT at all. Ron didn’t like the idea that you’d need to
suppress an error message in order to write ‘loose’ code:

He and Jochem agreed that it would be better to throw an
E_NOTICE than an E_STRICT for strict OO recommendations,
but it would be preferable to both of them not to rely on this.

Rasmus came back into the conversation to agree with Derick and Pierre that a new
keyword wasn’t an optimal solution, and asked whether interfaces couldn’t be used to
indicate strictness? Pierre by then had decided there wasn’t a good alternative to a
keyword; it seemed that interfaces alone do not resolve the problem, although he
hadn’t yet understood why. Rasmus (also not an OO man) thought it was probably
because interfaces don’t care about method signatures, and asked the OO people
whether there are reasons beyond this. Jochem pointed out that actually interfaces
do care about method signatures already, and proved it:


# php5 -r 'interface Foo { function bar($v,
$k); } class Qux implements Foo { function bar() {} }'
Fatal error: Declaration of Qux::bar() must be compatible with that of
Foo::bar() in Command line code on line 2

# php5 -r 'interface Foo { function bar($v, $k); } class Qux implements Foo {
function bar($a, $b, $c) {} }'
Fatal error: Declaration of Qux::bar() must be compatible with that of
Foo::bar() in Command line code on line 2

Stefan Walk grumbled that ‘use interfaces’ would mean having to write an
interface matching every public method in a class to attain strictness, and another
for every inheriting class that adds a new method. To him, this was not a solution.
It would mean code duplication!

Derick wrote that the problem with using interfaces was that existing code would
still need to be modified to attain strictness. He has code running under PHP 5.1,
and would like it to retain 5.1 compatibility. If in PHP 5.2 he needed to implement
a specific interface to get strict behaviour, that code would not run under PHP 5.1
without hacks. The other option would be not to mark anything as strict, which would
make debugging painful. The first three items in Zeev’s proposal, on the other hand,
would give him no real technical problems. All he’d need to do is ignore
E_STRICT if he wanted to violate OO rules.

Lukas reiterated that he was fine with that proposal, so long as
E_STRICT is split into actual E_STRICT (meaning that the
code is not strict OO) and E_DEPRECATED (meaning that the feature will
be dropped in the next major PHP release). He added, just in case it wasn’t clear to
anybody, that the first two points in Zeev’s proposal apply to classes at the C
level; ‘userland classes will only trigger E_STRICT if they are
improperly written’
. He wondered whether it would be possible to get information
about internal class strictness through the Reflection API?

Andi Gutmans, writing that it had taken a while to plough through ‘the
gazillion emails on the topic
‘, agreed with the view that it should be possible
to find some middle ground that would make everyone happy. However, he didn’t think
that making interfaces have additional semantics, or having their signature
semantics different from those of abstract or standard classes, was the way to
achieve this. He took the ideas from the thread that he liked, and made a new set of
proposals from them:

Zeev and Derick both backed Andi’s proposals in full.

Short version: E_STRICT is about to come into its
own.

REQ: zend_string_to_number

Blithely ignoring WWIII, Matt sent in a post about string-to-number functions in
PHP. He thought that is_numeric_string() and
is_numeric_unicode() could both be optimized if a pair of new internal
functions, zend_string_to_number() and
zend_unicode_to_number(), could be added into the mix. This would allow
is_numeric_* to make a single function call that would be responsible
for handling both long and double values, thereby avoiding
unnecessary calls to strtol() or zend_strtod(). However,
in order to achieve this, doubles would need to be handled ‘correctly’ using simple
code, as in the existing zend_string_to_double(). The function
prototype he had in mind looks like:


zend_uchar zend_string_to_number(char *str,
char **end, int base, long *lval, double *dval)

Matt added that he would have found such a function useful in his experiments to
update the dec*() number conversion functions to handle large values.
The existing is_numeric_string() and
convert_scalar_to_number() allow hexadecimal strings; he needed
something more like a plain convert_to_number().

Short version: Back-door BC breakage…

NEW: PHP 4.4.3

Derick Rethans, as Release Master of the PHP 4.4 series, posted the
following:

Joe Orton was quick to notice that an item in the ChangeLog
referenced substr_compare(), which doesn’t actually exist in PHP 4.4.3.
Derick, who had fixed the error once already but pressed the wrong button at the
time, fixed it again.

Short version: And yea, there was much dancing in the
streets.

TLK: Method inheritance and parent classes

Christian Weiske mailed internals@ over the fact that the following code executes
B::build() twice, rather than A::build() and then
B::build() as he had anticipated:


<?php

class A {

    function __construct() {
         
$this->build();
    }

    function build() {
         echo
"buildA ";
    }
}

class B extends A
{

    function __construct() {
         
parent::__construct();
         
$this->build();
    }

    function build() {
         echo
"buildB ";
    }
}

new B();

?>

He suggested that the methods of classes added by subclasses should never be
called.

Jason Sweat pointed out that this would render the Template
Method
design pattern entirely useless, and Richard Lynch added that having it
run B::build() was the whole point of inheritance and subclassing. If
Christian needed to override this behaviour perhaps he should use
A::build(), assuming it was still legal to do so. Or perhaps there was
some get_method() function that could be used, he asked hopefully.

Marian Kostadinov came to the rescue with self::build().

Short version: Purism leads to yet more confusion, even in the upper
ranks.

CVS: 4_4 branch now open for fixes

Changes in CVS that you should probably be aware of include:

  • In ext/oci8, bugs #37581 (oci_bind_array_by_name clobbers input
    array when using SQLT_AFC, AVC) and PECL #7827
    (oci_pconnect password issue) were fixed [Tony Dovgal]
  • PDO_MYSQL’s object destruction bug #37445 (Reproducible segfault) was fixed [Ilia
    Alshanetsky]
  • Session bug #38289 (segfault in session_decode() when
    _SESSION is NULL) was fixed [Tony]
  • A worse session bug, #38278 (session_cache_expire()‘s value does not match
    phpinfo()‘s session.cache_expire), was fixed across all
    current branches of PHP [Tony]
  • Last week’s socket_select()/invalid arguments fix was back-ported
    to PHP_4_4 branch [Tony]
  • In ext/curl, bug #38269 (fopen wrapper doesn’t fail on invalid
    hostname with curlwrappers enabled) was fixed [Tony]
  • In ext/pdo_pgsql, bug #38168 (PDO Exception Causes PHP Crash), reproducible when
    bound parameters were missing, was fixed [Ilia]
  • In ext/gd, there is now better error reporting for
    imagepng(), imagegif() and imagejpeg() in
    line with feature
    request #36995
    [Tony]
  • WDDX serialization’s encoding issues were fixed, resolving bugs #38213, #37611 and #37571 [Ilia]
  • ext/xml gained a new userspace function,
    xml_utf8_encode(), in PHP_5_2 and CVS HEAD [Ilia]
  • Core bug
    #38276
    (file_exists() works incorrectly with long filenames on
    Windows) was fixed, also in 5_2 and HEAD only [Ilia]
  • SPL bug
    #38303
    (spl_autoload_register() suppresses all errors silently)
    was fixed [Ilia]
  • Core bug
    #38322
    (reading past array in sscanf() leads to arbitary code
    execution) was fixed across all current branches of PHP – including PHP 5.1
    [Tony]
  • In ext/imap, bug #37265 (imap_body() able to bypass
    php_openbasedir) was also fixed across all four open PHP branches
    [Ilia]
  • In ext/simplexml, bugs #38347 (Segmentation fault when using foreach
    with an unknown/empty SimpleXMLElement) and #38354 (Unwanted
    reformatting of XML when using AsXML) were fixed [Tony, Chregu
    respectively]

Dmitry Stogov worked his way through a bunch of install-related stuff, applying
Joe Orton’s PHPIniDir directive for Apache 1.3.*, implementing Richard
Quadling’s ideas for version support in the Windows registry, and allowing the
PHPRC global (also under win32) to specify a full file name.

Meanwhile, this was a busy week in CVS HEAD, where Andrei (who doesn’t like long
threads much) finished up work on the array functions and started working with the
file system functions. He implemented Unicode support for a range of these. He then
added internals macros to test for path separator characters, before going on to add
a new modifier for the s type in the new parameter parsing API (with a
little help from Rob Richards). The & modifier will apply the
specified converter during Unicode string conversion. Andrei ended his week by
adding the add_assoc_zstr_ex() function family to the Zend API.

Rob also gave some assistance to Tony, who was struggling a little with a TSRM
issue, before committing his work bringing Unicode support to the DOM extension.
Dmitry also completed SOAP unicode support during the course of the week.

Short version: Check out CVS HEAD. It’s getting to be worth
it.

PAT: Long-awaited line directive [continued]

Andi responded to Marcus’ line directive implementation with some bad news.
Debuggers wouldn’t work with it; they would need altering to accommodate it. Also,
he suspected that somewhere in the deeper recesses of the Zend Engine lies something
that expects a filename to be ‘correct’. He wrote that he was still investigating to
see whether it might have any far-reaching effects he hadn’t discovered yet.

Andi still wanted the patch enough to care about the syntax, and suggested that
using the existing declare() {} would be better than adding the new
#line. He wound up by promising to contact Marcus if he came across any
further issues.

Marcus explained that he’d used the #line syntax and semantics
because it seemed the most convenient for generator tools, and asked William
Candillon to comment on Andi’s idea of using declare() instead. He also
wasn’t certain that the patch would have no side effects, but had been unable to
find any issues. Marcus wondered if perhaps Zend extensions would be adversely
affected in some way, and pinged Derick to query the #line directive’s
behaviour under XDebug.

Finally, Pierre committed a patch offered by the reporter of openssl bug #36732 (configargs
req_extensions & x509_extensions broken), adding
req_extensions support to openssl_csr_new() and
openssl_csr_sign().

Short version: Teamwork in action.