Zend Weekly Summaries Issue #346

TLK: MySQL and SSL
REQ: INI intelligence
FIX: snprintf/var_export
TLK: unicode.semantics [continued]
BUG: Array in 5_2
TLK: The meaning of proto
CVS: INFILE LOCAL and safe_mode
PAT: SNMP patch and proposal

18th June – 23rd June 2007

TLK: MySQL and SSL

Someone named Christopher Weldon kicked off the week with a query about
the issue of MySQL and SSL connectivity through PHP‘; did anyone have
a patch to fix it? Since he used the shortcut of referring to two elderly bug reports (duplicates?) rather than
actually describing the problem, nobody with a day job had a clue what he was
talking about. Luckily, student Tijnema was around and had enough time on his
hands to do a modicum of research. He came back with the news that bug #28075 had been assigned to Georg
Richter back in 2004 but apparently had not been fixed. There was, however, a
solution attached to the bug report that Christopher might be able to use.

Christopher replied that the solution in question wasn’t written for the
version of PHP he was using. Besides, he’d be more confident about using it
if the package maintainer had reviewed and applied it in CVS. Tijnema pointed
out that it had been proven to work in PHP 5.0 beta (yes, that long
ago) and suggested that there possibly hadn’t been enough changes made in
ext/mysql in the intervening years to prevent it working now. Had
Christopher tried applying that patch?

Short version: Adding information to the bug report is more likely to result in a fix.

REQ: INI intelligence

Andi Gutmans had the idea of making PHP under Windows capable of recognizing
the .so file extension in php.ini as a shared library, as the
Apache server does in httpd.conf. If this were accepted, it would no
longer be necessary for PHP users to maintain two sets of INI files between
UNIX and Windows machines.

Stefan Priebsch, although he liked the idea, pointed out that the directory
layout between the systems is usually different. You’d still need to maintain
separate INI files when it came to things like include_path and
extension_dir. David Zülke agreed, and also mentioned
sendmail/SMTP settings. Lukas Smith summarized; for this to be
really useful, there’d need to be a general way to handle platform
differences. Then again, perhaps simply minimizing those differences was a
good place to start… Pierre-Alain Joye explained that the idea was just to
use the same set of extension=foo directives for as many
platforms as possible, not to make php.ini files completely portable.
Directory paths would differ in any case, but at least this part of it is
relatively obvious. Stefan, though, was already brainstorming towards greater
portability:

Pierre wrote that Andi’s original, simple idea was enough to make extension
management easier, and that in itself was worth doing. Forward slashes are
supported under Windows already. As for the final point on Stefan’s list,
this was precisely what was being proposed already!

François Laupretre thought dl() should be made to work in
the same way as Andi’s proposal; the added checks wouldn’t be a problem, since
performance doesn’t matter there. That said, he also thought there should be
no need for the .so filename extension; just the name of the PHP
extension should be enough.

Apache developer William A Rowe wrote to explain to Andi that what Apache
actually does is name the actual file .so; there is nothing
special about the filename extension .dll. His own Windows builds of
PHP are entirely in the *nix style because his customers need to toggle
between Windows and Unix. He’d found it useful, and noted that documentation
is much simpler when there is only one set of files.

Richard Quadling was less sure. He wrote that most Windows users would find
it ‘very odd‘ to see .so files on their system. He went on to
explain that, under Windows, most filename extensions are linked to an
appropriate handler via the registry; .dll files, however, are
registered as “Application Extension” and have no default Run/Open handler.
Making the PHP files .so wouldn’t cause any problems under Windows per
se – but, in his view, would be confusing for many Windows users.

Short version: The original idea was better.

FIX: snprintf/var_export

Following last
week’s
note about the problem of the comma being used as a float decimal
separator in certain locales, Derick Rethans wrote to internals@ to say that
he’d now implemented a new printf() modifier, H.
This is exactly the same as the existing G modifier, but is
locale-insensitive.

Having used his new modifier to fix var_export(), Derick wrote
that he couldn’t do the same for the Zend Engine function
_build_trace_args(). It uses zend_sprintf(), which
is simply a wrapper for sprintf(), which lacks an equivalent
format modifier to Derick’s H. There were three options open at
this point; a) always use PHP’s own sprintf() implementation, b)
use php_printf(), or c) ignore the problem of locale-sensitivity
in _build_trace_args().

Short version: We’ll just ignore it then, shall we?

TLK: unicode.semantics [continued]

Peter Brodersen picked up the unicode.semantics torch and ran
with it. It seemed to him we’d left if (get_magic_quotes_gpc())
behind only to replace it with a new equivalent, and this even in code
written explicitly for PHP 6.

Rasmus Lerdorf protested that there is no difference between doing this and
writing code that will work under both PHP 5 and PHP 6. The only thing is
that you’d need to check for Unicode mode rather than for the PHP version. As
Rasmus made clear last week, the
plan was to avoid having two separate code bases at a time when resources are
stretched to the limit already. Having the Unicode decision synonymous with
PHP 6 would break that plan and force the team to support PHP 5 for much
longer than anticipated.

The thing that bothered Peter was the need to consider different flavours of
PHP even within a single version. If unicode.semantics is likely
to be switched on in most installations, people might make assumptions about
that, as with magic_quotes_* in the past. His fear was that
hosting companies were likely to end up with different settings, or possibly
even with two separate installations of PHP 6.

Pierre, noting that nothing he and the rest of the core team had said seemed
to influence Rasmus towards a Unicode-only PHP 6, wondered about the base
assumption that retaining back compatibility will lead to more users
upgrading. Rasmus pointed out that he isn’t actually the only team member in
favour of the Unicode switch. Besides, dropping back compatibility for PHP 6
would mean maintaining two separate code bases for the next decade. Lukas
backed Pierre, citing the PHP 5 migration experience. Perhaps it was time for
a different approach, and BC should be broken where it makes sense to do so. A
clean break would give an opportunity to clean up the code base and offer
users a real choice. Sooner or later, Lukas believed, most users would be
attracted by new shiny stuff‘ and make a real effort to migrate their
code. Application developer Tomas Kuliavas disputed that; he wrote that he’d
simply ask his end users to run PHP 5, or else run two PHP versions on a
single host. He couldn’t sanely update his code to run under PHP 6 in Unicode
mode because it would break back compatibility, and he wouldn’t be attracted
by Lukas’ “new shiny stuff” because the last “new shiny stuff” he could
actually use came with the PHP 5.1.0 release.

PHP user Jeremy Privett wrote in support of a Unicode-only PHP 6. It didn’t
make sense to him to have the whole selling point of PHP 6 be something that
is capable of being turned off. ‘If you have BC, it’ll get used simply
because it works with old code, but the main thing that changed about the
language will never be touched.
‘ Vendors are unlikely to support both
versions, but will lose sales if they don’t; their safest option would be to
stay with PHP 4 or 5. In short, the unicode.semantics switch
won’t make anybody’s life easier, and will in fact make it harder for people
to adopt PHP 6. Finally, Jeremy advocated dropping support for PHP 4 and 5 in
a planned manner following the PHP 6 release, with plenty of publicity up
front. The current approach will lead to a majority of applications working
under PHP 4, PHP 5 and a subset of PHP 6 installations, and/or 95% of PHP 6
installations turning off Unicode support.

Rasmus didn’t see a problem with unicode.semantics=off being the
default setting for ISPs, and pointed out that writing full Unicode support
into an application takes quite a bit of work. Large applications and
services that have control over their server environment are the target here.
In fact Rasmus was prepared to remove the unicode.semantics
switch if enough people wanted that, but felt that doing so would mean
the regular Joe User on a shared server‘ will miss out on every
improvement made in PHP 6. Ilia Alshanetsky pointed out that this meant there
would inevitably be two PHP versions continually under development; one with
Unicode support, and one without. Rasmus disagreed with the notion that PHP 6
without Unicode is effectively a slower version of PHP 5; he thought new
development, such as namespace support, should go into PHP 6 only. Derick
really didn’t like the idea that 95% of hosts running PHP 6 might have
the crippled non-unicode version‘; it made the whole exercise
pointless. Mike Robinson found the idea that PHP 6 without Unicode might be
“crippled” a startling one, and added his voice to the many calling for a
complete BC break and a cleaner code base.

Pierre meanwhile was busy taking Rasmus apart over his assertion that he
represents the majority of the team. Most of the core developers he’d spoken
to were unhappy with the current Unicode implementation; if he had that
wrong, he’d like to hear from the rest. As for the need to maintain two code
bases – ‘We maintained three branches since a couple of years, having only
two is real progress’
. The code would be cleaner – and probably easier to
merge – if the Unicode implementation were kept completely separate.

Jani Taskinen wrote that he believes there will be more PHP 6 users than PHP
5 users, simply because of the demographics; there are far more Asian and
Arabic people in the world than there are Westerners. He’d much prefer to
have a Unicode-only PHP 6 and maintain PHP 5 alongside it. This would ensure
that support for PHP 5 can be dropped at some point, whereas having two
versions in the same branch effectively means supporting it forever. Jani
felt that having to do everything twice over in the same branch was a very
bad idea, and suspected that such considerations hadn’t been uppermost in
peoples’ minds when the decision to implement unicode.semantics
was taken.

Rasmus wrote rather tersely that it would be nice to know exactly what had
happened to make Jani change his mind about supporting that decision. Jani
retorted that CVS HEAD is becoming unmaintainable to the point that he isn’t
prepared to do the work, particularly given that he sees no need for it to be
this way. Ilia had already written that PHP development is effectively forked;
perhaps it would be wise to simply accept this. Jani added that the adoption
of PHP 5 would have been much faster if the support for PHP 4 had been
officially dropped – a point with which Pierre was quick to agree; support
for PHP 4 should have stopped already. Derick replied swiftly ‘End of the
year
‘, but Pierre noted that an April Fool’s Day post on a personal blog
doesn’t constitute the official announcement he’d had in mind.

Back on track, Lukas wrote that the PHP 6 adoption rate would most likely
depend on its performance. That said, he felt there are ‘enough pain
points even without Unicode
‘ to give users an excuse not to migrate. In
Lukas’ opinion, if people didn’t upgrade to a fast-running, Unicode-only PHP
6, the best bet might be to backport its other features to PHP 5.

Ilia pointed out that Rasmus’ statement about the beneficiaries of Unicode
support was relevant here. Most users don’t need it; those developing
multi-language applications do, and those users tend to be large companies
that have full control over their environment. The only thing the average Joe
had to gain by migrating from PHP 5 would be a drop in speed, and Ilia thought
this might be a difficult sell. Outside Unicode support, every other feature
or addition in PHP 6 could easily be ported to PHP 5, as Lukas had mentioned.

Matt Wilmas wondered aloud whether it might be possible to have Unicode
support “always on” but keep the behaviour of PHP 5 code exactly as it is
now. His idea was to put the onus on the user to explicitly write
Unicode-aware PHP; this way, if you got Unicode strings, it would be because
you explicitly triggered them. He missed, however, that the chief concern
from the development team is over the need to support two versions in one.

Andi came in to support Rasmus, stressing the need to retain backward
compatibility and asserting that PHP 6 is slow only because optimization work
hasn’t started on it yet. Lukas wrote again that the possibility of attaining
PHP 4/5 compatibility with the same userland code had hindered rather than
helped PHP 5 adoption. Pierre complained that Andi was ignoring everything
that had been said about the extra work required to maintain two modes in CVS
HEAD. Most of the team were against if (UG(unicode)) for two
reasons: a) PHP 6 without Unicode makes no sense, and b) the extra work
required in the code base. They didn’t even care about its performance, at
least at this stage.

Andi wrote that there will be ‘additional value‘ in PHP 6, and this
should be enough to convince even those not needing Unicode support to
upgrade. He completely disagreed with Pierre’s point about PHP 6 making no
sense without Unicode support. The Zend team had invested a lot of time in
working on namespace support and late static binding, although they still
didn’t have a proposal for the latter that performs well.

Pierre argued that, other than Unicode support, there were no plans for PHP 6
that couldn’t also be created in PHP 5 – whether that be via PECL or through
actual backporting. He seriously doubted that those not wanting Unicode would
migrate to PHP 6 in droves, but if so all Unicode specific code should be
moved into an extension so that those who really needed it could use it
without affecting the rest. It would be slightly more limited from an
extension, and won’t provide ‘all the fancy things like writing PHP
scripts in random languages
‘, but Pierre saw no reason an extension
couldn’t provide what is needed.

Short version: Much passion spent.

BUG: Array in 5_2

Pierre reported seeing errors like:


Notice: Undefined offset: 534 in run-tests.php on line 1682

when running make test under current PHP_5_2. Obviously several
of the tests were failing, too. He believed something must have been broken
in the array source code recently, adding that the test suite
had worked earlier in the week. However, he didn’t have time to investigate
the problem there and then.

Zoe Slattery and Oliver Block – she from Ubuntu 6.10, he from SuSE 9.3 – both
reported that they couldn’t reproduce Pierre’s problem on their respective
systems.

Short version: A very quiet week.

TLK: The meaning of proto

Oliver wanted to know what the prototype entries above function declarations
in the source code are actually for. Stas Malyshev explained that they simply
describe the arguments and return value of the function, and offer a short
description of the function’s role. A prototype, or ‘proto’, can then be used
by automatic tools to collect information about that function.

Oliver later used this information. He posted a patch ‘to insert the
missing function entry for imap_listscan()
‘. Unfortunately, what Stas
hadn’t told him was that prototype entries are also used in a completely
different way – namely, to indicate function aliasing, of which there is
rather a lot in the imap extension.

Short version: A for Effort.

CVS: INFILE LOCAL and safe_mode

Changes in CVS that you should probably be aware of include:

  • A couple of fixes that didn’t make it into CVS HEAD during the ‘big
    merge’ last week finally went in (foreach() by-ref and bug #40432 (strip_tags()
    fails with greater than in attribute)) [Dmitry]
  • In the Apache SAPI, bug #41628
    (PHP settings leak between VirtualHosts in Apache 1.3) was fixed
    across all current branches of PHP [Scott MacVicar]
  • In ext/libxml, bug
    #41724
    (libxml_get_last_error() – errors survive request
    scope) was fixed [Ilia]
  • Core bug #41686 (Omitting
    length param in array_slice not possible) was fixed
    in PHP_5_2 branch only [Ilia]
  • MySQL’s INFILE LOCAL option handling is now disallowed
    when safe_mode is active in ext/mysql, ext/mysqli
    and ext/pdo_mysql, depending on which PHP version you look at
    [Stas]
  • The bundled PCRE library was upgraded to version 7.2 in CVS HEAD and
    PHP_5_2 [Nuno Lopes]
  • Core bug #39215 (Inappropriate
    close of stdin/stdout/stderr) was
    re-visited and fixed, hopefully for good this time [Dmitry]
  • The recode extension should now work on amd64 across all current
    branches of PHP [Stas]

In other CVS news, Derick picked out Ilia’s fix last week for bug #41655 (open_basedir
bypass via glob())as being the one to blame for his freshly
broken PHP_4_4 build. Pierre came up with a theory as to why this should be;
php_dirname() in PHP 4 works differently to
php_dirname() in PHP 5, where the path passed to
the dirname() function remains untouched. In PHP 4, the only
workaround he could see was to duplicate the path prior to calling
dirname() on it. He provided a fix that should
work under PHP 4, but confessed to not having tested it beyond make
test
.

Short version: This was definitely Pierre’s week.

PAT: SNMP patch and proposal

One Gustaf Gunnarsson posted a trivial patch against current PHP_5_2 to fix
memory leaks in the SNMP module. However, he was more interested in proposing
support for multiple get/set operations in a single PDU. He’d rather not
utilize the existing php_snmp_internal() generic object fetcher,
calling it ‘too complex and hard to audit‘, and wanted to know whether
avoiding it would be acceptable to the extension maintainers (who?).

The functions Gustaf envisioned would be something like:


array snmp_mget(string $version, string $hostname, mixed $authparameters,
array $variables, [int $timeout, [int $retries]])
boolean snmp_mset(string $version, string $hostname, mixed $authparameters, array
$variables, [int $timeout, [int $retries]])

where $authparameters may be one of string
<community>
, array(<community>) or
array(snmpv3param1, snmpv3param2,....). The return value of
snmp_mget() would be value pairs of oids, similar
to the return value of snmprealwalk().

Debian dude Sean Finney sent in a short note to say that he’d seen a report about memory leaks in
ext/snmp, but wasn’t familiar enough with the code to comment on
Gustav’s patch.

Short version: The patch looks fine, the plan marginally less so.