Zend Weekly Summaries Issue #338

      Comments Off on Zend Weekly Summaries Issue #338

TLK: Zend API docs
REQ: Coding recommendations
TLK: The vagaries of require_once
TLK: PHP 6 TODO list [continued]
TLK: PHP 5.2.1 performance
TLK: Accessing the Zend MM
NEW: PHP 5.2.2 RC2
TLK: Testing PHP 6
TLK: HTTP arrays
CVS: New CLI option to return INI path
PAT: Interpolated strings, direct access to statics

TLK: Zend API docs

Someone named Ci was struggling with a patch he was writing to log individual
calls to the mail() function. He couldn’t find a way to access
variables internally:


zval *data = NULL;

zend_hash_find(Z_ARRVAL_P(PG(http_globals)[TRACK_VARS_SERVER]), "DOCUMENT_ROOT", sizeof("DOCUMENT_ROOT"), &data);
strcpy(buffer, Z_STRVAL_P(data);

gave him an incorrect value for buffer. Could anyone help?
Tijnema – who knows little of PHP internals but plenty about the C language –
jumped in to vet the code sample Ci had given, but it seems that this was just
a very rough version of what Ci actually had on his development box.

Stefan Esser explained that the code should look more like:


zval **data;

if (zend_hash_find(Z_ARRVAL_P(PG(http_globals)[TRACK_VARS_SERVER]), "DOCUMENT_ROOT", sizeof("DOCUMENT_ROOT"), &data)==SUCCESS) {
    ...
    strlcpy(buffer, Z_STRVAL_PP(data), sizeof(buffer));
}

but Ci was still struggling. The holdup appeared to be the call to
strlcpy(). Tijnema suggested zero termination, and this seems to
have done the trick. Ci, thanking everyone involved, noted in passing that
documentation of the PHP API is tragic‘, and of course everybody
immediately suggested he should submit some docs of his own.

This set Gwynne Raskind (still Daughter of the Code) wondering if it mightn’t
be a good idea for her to write a book on the subject. Derick promptly made
her aware of Sara Golemon’s Extending
and Embedding PHP
, which Gwynne had somehow missed during her research
(let’s hope Sara’s publishers aren’t reading this). Gwynne didn’t see why the
existence of one book precludes the writing of another, but noted that the
official documentation is clearly in need of updating. She proposed
undertaking that task instead, ‘if people think that would be more
useful
‘ (it would).

Tony Dovgal had been thinking of adding documentation directly into the
source itself to provide a basis for the manual, but hadn’t been able to find
a tool that would fit his needs. Although finding Doxygen output
completely unreadable‘, he noted that it provides a much wider range
of output formats than most of the tools available. That said, he for one
would appreciate some attention being paid to the Zend API documentation. As
he wrote, ‘the problem is that you need to know the internals a bit be
able to document it, and all the people with this knowledge are busy with
other things most of the time…
‘ but, that said, he would be prepared to
pass on hints. Stas Malyshev later offered similar support to anyone prepared
to write ‘human docs‘.

Doxygen fan Marcus Börger promptly launched into the same defense he
usually gives for choosing it above all other forms of documentation, and
there was a brief and unseemly skirmish. Somewhere in the middle of it all,
Gwynne wrote that she’d written enough extensions to have ‘some idea of
the internals
‘ and was willing to give Zend API documentation a try.
Would it be better to ask for CVS karma immediately, or should she wait? Tony
suggested gently that she should set up a build environment for the docs (a surprisingly non-trivial
task
) and start by sending in patches.

Short version: No glory without blood, sweat and tears.

REQ: Coding recommendations

Someone named mbneto posted a note to internals@ requesting a definitive list
of the features that will be altered or deprecated in PHP 6. He hoped to use
such a list as a basis to start preparing PHP users for the new version.

One Davi Vidal pointed him to the original PDM
notes
(not always to be taken as gospel these days) and to Lukas Smith’s wiki, which hosts
the project TODO list.

Lukas himself intervened to say that he’d be giving a presentation on that
very subject at the upcoming MySQL User Conference. He promised to make the
slides from the talk available on his
blog
later in the week.

Short version: This is what E_STRICT is there for.

TLK: The vagaries of require_once

A Dominic Letz wrote to the
internals list
, initially with a complaint about strange results in PHP
5.2.1 when using the reference operator on arrays. For a start, the entire
array was copied on write when called by reference, and for a second thing, a
call by reference was much slower than a call by value. Could anyone explain?

Before anyone even had time to do so, Dominic was back with another niggle; he’d found
that require_once is very much slower than a simple
require. Worse, it appeared to be slower than a function he’d
written in PHP code, myrequire_once()!

Longtime PHP contributor David Sklar explained nicely about the
stat() calls in require_once and the reasons full
paths should be used throughout. Tony, meanwhile, went away and benchmarked
Dominic’s function. He returned with the news that the PHP function was
approximately 5 times slower than require_once, which was hardly
unexpected. Dominic was simply stunned by the speed of Tony’s machine, as
evidenced by those results, but Derick Rethans, Tijnema and Tony all wrote
that Linux – whether installed on a fast machine or not – is faster than
Windows.

Stas explained further; the filesystem layer in Windows operating systems is
very slow, compared to Linux, and this will affect the performance of any
function involving system calls at that level. That said, since Dominic’s
homemade function neither resolves nor canonicalizes paths, it is in no way
equivalent to require_once anyway.

Short version: require_once behaviour comes up so often it should
be in the FAQ.

TLK: PHP 6 TODO list [continued]

Continuing his efforts to administrate the PHP 6 TODO list, Lukas found
there’d been no apparent movement on the runtime JIT implementation since
March 6th. He asked if anyone else felt capable of taking over the task from
Dmitry Stogov and Pierre-Alain Joye, since both developers are apparently too
busy to spend time on it. This, as Lukas noted, is now the main thing standing
in the way of a PHP 6 preview release.

Moving on to the question of namespace support at Andrei Zmievski’s request,
Lukas noted that although there had been agreement over including it, there
hadn’t been any recent activity in that area. He understood that Marcus
currently doesn’t have time to take ownership of the task, but hoped he might
still be able to mentor someone else’s work. Given that there are only two
major issues outstanding – the choice of a namespace separator and the
problem of fixing import – Lukas proposed that the backslash
character be used as an interim separator, allowing the team
to focus on the implementation itself. Pointing to Jessie Hernandez’
summary
of the current status, he asked whether a) Jessie is still
interested in implementing namespace support and b) anyone on the Zend team
has time to help him test and commit it? Andrei pointed out swiftly that Zend
karma isn’t necessary to help with the task itself. Guilherme Blanco equally
swiftly posted his vote for ::: as a namespace separator,
provoking a heartfelt plea from Stas: ‘Please not the “my separator
is weirder than yours” thread again!

Jessie wrote in, explaining that he didn’t have much spare time himself
lately but would like to continue working on the implementation. He asked
whether the patch would only be needed for PHP 6, and both Stas and Lukas
confirmed this.

Finally, Lukas went to the issue of late static binding. Has a syntax been
agreed? He had a recollection of static:: being the favourite.
Andrei agreed that that seemed fine to him, and Marcus clarified the
situation; yes, everyone had agreed to that syntax. The problem was
that the only implementation produced so far came at the cost of a slowdown
for every function call. A proposed solution – pushing the context onto the
calling stack along with $this – would mean pushing four
pointers rather than the current three; there has been no analysis of the
performance impact of this solution to date.

Short version: Late static binding, namespace support and runtime
JIT are the biggest holdup areas.

TLK: PHP 5.2.1 performance

Mauro Infantino wrote to internals@ asking whether there are any known
performance issues with PHP 5.2.1? He’d found that one of his homegrown
applications took nearly twice as long to perform under PHP 5.2.1 as under
PHP 5.1.6, and Xdebug profiling made it several times worse. He could send
cachegrind files from both if need be.

Andrew Hutchings wrote that he’d given a heads up about PHP 5.2.1’s
performance under Linux before now, but
it was all in my imagination apparently‘. Mauro replied that it was
possible there is no performance problem – it’s just that he didn’t have a
logical explanation for the behaviour he was seeing. Perhaps there was some
specific extension or Apache version causing the slowdown, in which case it
should be documented and reported to prevent others falling victim to the
same thing. Andrew simply suggested that Mauro upgrade to PHP 5.2.0 instead.

Having had no further response, Mauro undertook a little investigation. He’d
benchmarked PHP 5.1, PHP 5.2.0, PHP 5.2.1 and PHP 5.2.2 RC1 under Windows, and
wrote that the strcat() issue raised by Andrew’s initial report
will be resolved with the release of PHP 5.2.2. However, he’d noticed along
the way that bench.php doesn’t test any OO operations, and had started
work on a new script that attempts to cover as many scenarios as possible. So
far, he hadn’t found a particular performance problem that would explain his
own experience – but given the number of PHP 5 OO frameworks out in the wild,
this could be an issue.

Having said all that, Mauro had found that class definitions – without
execution – take longer under PHP 5.2.x. He was trying to find a way to
benchmark this, but couldn’t find a way to iterate the use case enough times
to get a significant result. He wasn’t sure whether eval() was
an option, because he didn’t know whether class definitions within
eval()‘d code would follow the same process. Did anyone have an
idea as to how he could achieve this?

Richard Lynch obliged:


<?php

$file = fopen('/tmp/foo', 'w'); //or whatever.
for ($i = 0; $i < 1000000; $i++) {
    
$class
= <<<EOC
    class foo_$i {
        //more stuff here
    
};
EOC;
fwrite($file, $class);
}
fclose($file);
//start
timer
require '/tmp/foo';
//end timer

//"Crude, but
effective Captain." -- Spock

?>

Mauro thanked him for this useful sanity check, but wrote that he believed
he’d found the problem. It was a purely Windows issue, related to relative
path inclusions, and would be noticeable to anyone using
include_path and __autoload() for their classes.
He’d now filed a bug
report
, and thanked those who had written to him – on-list and off – for
their help in tracking this down.

Short version: You can always spot a Windows-only bug – it’ll be
the one nobody else can reproduce.

TLK: Accessing the Zend MM

Brian Shire had been familiarizing himself with the new Zend memory manager,
and believed he’d spotted a problem with it. He wrote to internals@ asking
how zend_mm_set_heap() could possibly work outside
zend_alloc.c, given that the zend_mm_heap struct
definition is in the C file rather than in the header? Shouldn’t the
allocation globals and associated structures be moved into the header file?

Dmitry Stogov explained that the function is used as a substitution for the
main PHP heap, so that emalloc() and related functions will
work. Brian thanked him for the explanation, but pointed out that none of the
heap struct values are accessible outside of zend_alloc.c. Outside that
file, something like:


zend_mm_heap *heap;
heap->overflow=0;

would generate a compile-time error. He posted a rough patch showing what
he’d done to make the struct properties accessible, but Dmitry’s response was
that he shouldn’t be touching zend_mm internals, full stop. The
whole point of encapsulating the internal memory structures in a source file
was that future changes to the memory manager could be made without breaking
binary compatibility. Internals programmers can replace the storage backend
or use several heaps via the provided API, without touching
zend_alloc.c at all.

Short version: Talking ’bout Zend documentation…

NEW: PHP 5.2.2 RC2

Ilia Alshanetsky, as Release Master for the PHP 5.2 series, announced the
availability of the second and hopefully final release candidate for PHP
5.2.2:

Short version: Download, test, and if there’s no obvious bug – mail
your test suite output to QA anyway.

TLK: Testing PHP 6

GSoC 07 participant David Coallier wrote to the internals list introducing
himself and his project, a refactoring of an existing framework to use PHP 6.
One of the main goals of the project is to help with the development and
testing of PHP 6, and David wanted to know which features the development
team would most like to see tested.

Mike Wallner wrote that the output buffering/handler code is completely new
and David should – in his own words – ‘torture it’. He also suggested
text processing (including compression), database access and XML handling as
areas to focus on.

David, helpfully, posted a list of the PHP 6 elements he already knew would
need testing:

  • cookies (setting, reading, etc)
  • sessions (setting, reading, destroying, etc)
  • XML Handling (SimpleXML, DOMDocument, XMLReader, XMLWriter)
  • Database Accessibility (PDO, et al)
  • output buffering, handling
  • perhaps namespaces if it gets into HEAD anytime soon
  • SPL (ArrayAccess, Iterator, IteratorIterator, DirectoryIterator,
    etc…)
  • array handling in general
  • general 64-bit capabilities
  • iteration over multi-dimensional arrays
  • goto/jump *grumble*
  • real static bindings
  • APC in the core… seeing how it reacts with unicode (when done)

Short version: It seems that PHP 6 will have fairly
comprehensive pre-release testing. Good.

TLK: HTTP arrays

Someone with a choice of names (apparently) wrote to internals@ asking why
the support for HTTP arrays (bracket syntax) had been removed in PHP 5.1.3 ?
Was it by accident or by design?

Tony asked openly what many were thinking: ‘What are you talking
about?

Jochem Maas suggested that it was probably a reference to the BC break in
http_build_query(). It seems a lot of PHP applications make use
of it to automatically convert $_GET and $_POST
elements into query strings, but since PHP 5.1.3
http_build_query() makes HTML entities of the square brackets.
Jochem had a workaround for this:


function
http_build_query_unborker($s) {
    return
preg_replace('#%5[bd](?=[^&]*=)#ei',
'urldecode("\0")', $s);
}


but commented, before anyone else could say it, that it made his skin crawl
just a little.

Mike Wallner wrote, fairly unhelpfully, that the function works as expected
under PHP 5.2.0. He eventually clarified this: the output for
get.php?a%5B%5D=1 should be (and is)


array(1) {
["a"]=>
  array(1) {
   [0]=>
    string(1) "1"
  }
}

The original poster reported unhappily that ‘a vanilla example works
fine
‘ for him, too; it seemed the behaviour he saw only occurs under
certain conditions. He promised to try and reproduce it. Jochem wrote that,
having gone back and tested, he could also no longer reproduce his
original problem. Concluding that ‘today there is no spoon‘, he
apologized for the noise.

Short version: A ghost in the machine.

CVS: New CLI option to return INI path

Changes in CVS that you should probably be aware of include:

Prior to PHP 5.2.2 RC2:

  • Ilia’s fix for JSON bug #41034
    last week was merged to CVS HEAD [Tony]
  • ext/curl has a new constant,
    CURLOPT_FTP_CREATE_MISSING_DIRS, in both 5_2 branch and HEAD
    [Tony]
  • ext/gmp gained a new function, gmp_testbit(), in
    CVS HEAD only. The function tests whether a specified bit is set. There is
    also a useful new constant in 5_2 and HEAD, GMP_VERSION
    [Tony]
  • New kid on the block Scott MacVicar fixed GD bug #40130 (TTF usage doesn’t work
    properly under Netware).
  • In CVS HEAD, the new commandline option --ini returns the
    INI path. This change means --ri is always available, regardless
    of whether reflection is enabled [Marcus]
  • In both CVS HEAD and PHP_5_2, the new commandline switch --ri
    main
    displays the core INI entries [Hannes Magnusson]
  • In PHP_5_2 branch only, simplexml bug #41175 (addAttribute()
    fails to add an attribute with an empty value) was fixed [Ilia]

After PHP 5.2.2 RC2:

  • A crash at server startup when a log message is printed was fixed in
    nsapi, the Netscape/SunONE/iPlanet server module [Uwe Schindler]
  • An invalid read caused by an uninitialized pointer in the Zend Engine
    was fixed (bug #41209) [Tony]

In other CVS news, Tony reverted the changes he made to
mysql_ping() in ext/mysqli a couple of weeks ago. MySQL extension
maintainer Georg Richter wondered why on earth he’d set reconnect as the
default behaviour, and asked him to revert to avoid a BC breakage. Tony
explained that the BC breakage already exists in the MySQL API; he’d simply
been trying to make the function behave as documented. Georg noted that the
breakage had in fact been documented in the MySQL API documentation – it was
only missing from the PHP manual.

Short version: Nobody’s infallible.

PAT: Interpolated strings, direct access to
statics

Hannes put up a patch for review fixing a couple of problems with the
php.ini search path in PHP CLI. It prevents the INI being picked up
from the current working directory on BSD platforms, and resolves the path to
the correct PHP binary location. Edin promptly gave it his vote, but Stas
looked into the patch and wondered why Hannes needed to reference the
VCWD (virtual cwd), since tsrm_realpath() uses
getcwd() anyway. Edin explained:

Stas was suitably impressed, and Hannes’ patch went into CVS HEAD without
further ado.

Matt Wilmas chased up on the heredocs/interpolated strings optimization he
offered last week, having had no response to it. Marcus wrote that it should
be committed to CVS HEAD for now, and added to PHP 5.3 at a later date if it
works well. He asked whether Matt had run any benchmarks? Matt pointed to his original message,
which contained a bunch of stats showing a 10-15% overall improvement –
particularly in scanner performance in ZTS builds – and a massive difference
for long heredoc strings. He added in an aside to Andrei that he’d spotted a
backticks bug in the parser on his travels. Andrei promptly fixed it, and
wrote that he’d also like to see Matt’s optimization patch go into CVS HEAD
and PHP 5.3. Andi Gutmans was a little more circumspect and asked for a few
extra days to review, given that the patch alters scanner behaviour.

Etienne Kneuss chased up on a more elderly
patch of his own
, offering dynamic access to static members, constants
and methods. He wrote that currently the only way to achieve this is to
create a static method and use the ‘quite slow


call_user_func(array($classname, 'getThat'));


Could someone please comment on his
patch
? Edin agreed that it would be nice to be able to directly access
statics without resorting to call_user_func(), but Tony was less
keen and gave it ‘+0 if it’s for HEAD only‘. He also commented that
there should be more tests for new functionality.

And finally, one Tyler Lawson sent a patch to alter the way JSON handles
ampersands, saying he’d had problems when sending JSON data through
POST requests where an ampersand separates variable/value pairs.
He wanted to convert & to u0026 to eliminate the
problem.

A bemused Tony asked for a short reproduce case, which Tyler obligingly gave.
Rasmus, having looked at Tyler’s code, explained that the ampersand problem
was due to a misuse of the Content-Type header. JSON – or any – data sent as


Content-Type: application/x-www-form-urlencoded

should really be urlencoded. Raw JSON data, on the other hand, should
have the MIME type application/json and be fetched directly from
the raw POST data. Tyson, realizing at once that Rasmus was
correct, apologized for the noise.

Short version: A short lesson on JSON usage.