Zend Weekly Summaries Issue #306

      Comments Off on Zend Weekly Summaries Issue #306

TLK: __autoloading functions [continued]
TLK: Ending an endless loop
TLK: Windows libxml2 build
TLK: Internals newbies
TLK: Module API changes
NEW: PHP 5.2.0 RC4
REQ: Filter release
TLK: Memleaks and virtual_file_ex
RFC: Filter API proposal
CVS: More about filter

TLK: __autoloading functions [continued]

One Daniel Hahler took issue with Marcus Börger’s contention that having
autoloaded functions would mean a slow down for every function call. Wouldn’t the
autoload option be considered only after PHP had found the function didn’t
exist? – in which case, it shouldn’t affect performance at all.

Michael Walter mentioned the impossibility of catching a fatal error.

François Laupretre agreed with Terje Slettebø’s point about the value of multiple
paradigms, writing that one of PHP’s biggest advantages is its short learning curve;
users don’t need to learn about object orientation in order to produce scripts. For
that reason, he couldn’t understand the emphasis given to object oriented features
by some of the core developers.

François felt it should be made possible to autoload both functions and
constants. He argued that organizing (or splitting) a program into several files
would become simpler, as those files would be included on demand, and that avoiding
the need to load included files and definitions could greatly improve performance.
He also wanted to rewrite the autoload feature to include a default
__autoload() handler; one part of this handler would scan the source
files and create a symbol map ‘offline‘ (huh?) to be used during runtime for
symbol resolution. This would avoid the need for one file per class and a naming
strategy. A similar mechanism could register every constant, function, class and
interface defined by each extension present on the server and store it in a map file
in the extension directories, allowing extensions to be loaded JIT as well – a
replacement for dl(), which François greatly misses. In fact, François
had actually written such a tool and runtime handler in PHP… it’s available for
download here.

All things considered, François didn’t see how there could be any performance
impact from extending __autoload() to include functions and constants:
It is just a hook in the error handler and does not slow down the code for
defined symbols.

Rasmus Lerdorf pointed out dryly that anything that took these tasks away from
the compiler and into the executor by definition would have a performance
impact when using an opcode cache.

Stas Malyshev, responding to Daniel’s question, agreed that it would be
technically possible to bypass the fatal error when a function is missing and call
an autoloading function instead. That said, he saw function autoloading as
conceptually wrong‘. It’s rare to keep functions in individual files; if you
wanted to autoload them, you would either need to build complex tables to map them
(as with François’ CLI tool) or organize them in well-defined modules. Such modules
already exist; they are called classes, and should be used when you want to autoload
a group of related functions. On the other hand, Stas felt that allowing an
assortment of unrelated functions to be autoloaded would probably promote very ugly
code.

Terje pointed out that classes are not the only way to group things, and may not
be the best. In fact, namespaces would be better when it came to autoloading,
because they can typically be re-opened and so span several files. Besides, Terje
saw free-standing functions as one of PHP’s main advantages over Java.

Stas pointed out that classes are the only way to group things that happens to be
native to PHP. He really didn’t like the idea of autoloading a namespace spanning
several files (‘how do you suppose autoloading function would know which file to
load?
‘), and didn’t see that being able to call f() was so much
better than having to call classWrapped::f() (‘it’s just another way
to say the same
‘). All in all, Stas didn’t see that making functions behave as
if they were class members without mentioning the word ‘class’ was the best idea to
support.

Terje wasn’t sure he followed that logic, and pointed to Java’s support for
calling classes by their full package.name.Class::f() or just
Class::f() if import is used. He wasn’t arguing for a
function to magically become part of a class or module… Stas wondered, in that
case, how Terje thought such a thing as he was proposing would be used? Terje agreed
that this was a good question. The obvious solution of having one file per function
would mean too much overhead… maybe function autoloading wasn’t such a good idea,
after all.

François pointed out that this obvious solution only allows one symbol per file,
and this wouldn’t help anybody. A smarter autoload manager was needed. He gave
another link to his own project – this time to the
component itself
– and suggested that this, or something
approaching it, could be rewritten in C and integrated into the PHP core. Surely, if
a cache were added to keep symbol maps in memory, the speed increase due to JIT
loading would balance out the autoloading overhead?

Short version: How to make a MINIT last much, much
longer.

TLK: Ending an endless loop

PHP user Ralph Schindler had found that an endless loop would simply cause a
segfault under PHP 5.1.6, rather than throw a fatal error. Was the condition
detectable at all? Should he file a bug report, or was this a known issue?

Johannes Schlüter confirmed briefly that this segfault is expected behaviour.
Stas elucidated; nobody has yet proposed a solution that will satisfy the following
conditions:

  • No false positives (i.e. good code always works)
  • No slowdown for execution
  • Works with any stack size

Unknown W. Brackets (we know him) wrote asking whether it’s possible to
determine the current stack position and maximum stack size on all architectures? If
so, perhaps an INI option could be added to trigger recursion depth tracking and/or
stack usage and throw an error when the limit is reached, thereby preventing the
segfault. He realized, however, that this approach would break two of Stas’ criteria
when the option was turned on – there could still be false positives and there would
definitely be a performance loss. It would only really be useful for development and
testing purposes.

Short version: Still seeking the Holy Grail of endless loop
detection.

TLK: Windows libxml2 build

Stas was struggling to build the PHP_5_2 source under Windows using MSVC++.
Everything so far had come together excepting libxml2 dependencies, which didn’t
seem to work. He needed to know which libxml2 version is required for PHP 5.2, and
whether to use a static or dynamic libxml2 library. He’d also discovered that
ext/libxml/libxml.c uses the symbol xmlDllMain, which isn’t
defined in standard libxml2 builds. Finally, he noted that PHP 5 requires several
libraries that PHP 4 did not, and yet those libraries are not included in the
win32build zip file available on php.net. Given that binaries sourced
elsewhere don’t always work cleanly with PHP, wouldn’t it be a good idea to update
that file with the binaries needed to build PHP 5?

XML guru Rob Richards was quick to respond. The PHP_5_2 branch uses the static
version of libxml 2.6.26 (currently), built using the
LIBXML_STATIC_FOR_DLL flag. A pre-built copy is available from his own site. He was working
on getting this library built by default within the standard libxml2 release, and
when that comes about Rob will change the name of the library – probably to
libxml2_a_dll.lib.

As for the other matter, the only way to keep win32build.zip up to date
would be to version it. As libxml2 is under development, the symbols exported from
the .def file in the PHP extension are updated to match the new version every
time there is a bundled libxml2 upgrade. Non-matching symbols would prevent PHP from
building successfully.

Ouch‘, wrote Stas. Still, if versioning win32build.zip is the only
way to achieve it, he guessed versioning was needed; as things are, there is nothing
obvious about building PHP 5.2, and he didn’t see why it should stay that way. Stas
also wanted to know about the non-standard URL, which – as luck would have it – he
was unable to reach immediately. Rob explained that it was needed because the
libxslt build available there had been customized specifically for PHP since day
one. The libxml2 build, on the other hand, had only been non-standard for the past
few days. For that reason, older versions of PHP would still build with the
appropriate standard libxml libraries; there would, however, be a thread cleanup
issue in any PHP build that used them. Stas advised Rob at this point that the
standard Windows distribution of libxml2 is in fact not usable with Visual Studio
6.0 – it’s built using a later version of MSVC++, which uses different library
symbols.

Short version: It seems Rob now needs to supply three libraries rather
than two.

TLK: Internals newbies

Someone named Daniel had been trying to build a string class. He wrote to
internals@ asking how existing PHP function names can be used as method names in OO
extensions. His experiments had compiled properly, but threw an
E_WARNING (‘cannot redeclare...‘) at runtime when the
methods were called.

Pierre-Alain Joye assured him that it is indeed possible to do this in PHP 5.2,
and pointed him to ext/zip for examples. Having taken a look, Daniel claimed
he’d approached his extension in much the same way. Pierre asked him to post some
code, and Daniel became a contender for the Golden Email Award when we were treated
to the entire 300 LOC.

Sara Golemon glanced through it and wrote to explain that Daniel had registered
his methods twice throughout – once in zend_module_entry as global
functions, triggering the error he was seeing. Replacing
string_functions with NULL there would kill the warning.
She went on to correct Daniel over a few other points. Naming his file
stringclass.h was a bad idea because it would break static linking, should he
ever wanted to build his extension statically. It needed to be
php_stringclass.h. There were a pair of macros defined in the code –
STRING_METHOD() and STRING_ME() – that did absolutely
nothing; did Daniel have a reason for using them? Finally, it’s common practice to
use uppercase for TSRM macros. ‘Not really an issue, but worth a
mentiCONFORM!!!!! YOU MUST CONFORM!!!!

Daniel thanked her, and confessed to brazenly copying the macros from
ext/spl without understanding why they were there, in his search for the
magic ingredient. He had no idea where he’d got the TSRM globals declaration from –
his extension didn’t even have any globals! – so those lines could be
dropped. Finally, he asked whether there is any interest in OO representation of
primitive types? If so, he was prepared to release his extension as a PECL package.
Sara explained that the use of primitive or common names is discouraged in non-core
extensions, and referred Daniel to the Date Wars summaries in these very archives.
Although the trouble there arose over a PEAR package, the same principle applies to
PECL extensions.

Short version: Daniel was last seen thinking up new names for string and integer.

TLK: Module API changes

Michael Allen discovered the module API change between PHP 5.0 and PHP 5.1 when
he tried to load a copy of his extension compiled with the former, in the latter. He
wrote that this is problematic when it comes to binary distribution. Currently, he
has copies of his module built against PHP 4, PHP 5.0 and PHP 5.1 in his
installation package, and the install script checks phpversion() to
determine which of the three should be installed.

Michael wanted to know how often module API changes come about, and whether there
is an easier way to ship extension binaries? He also wondered whether PHP’s minor
version number is guaranteed to change alongside the module API, and whether there
had been any further module API changes since PHP 4 that would cause a module to
fail to load?

Pierre explained that there can be several changes in the internals API between
minor versions, and (we got there eventually) yes – the module API is different in
each PHP minor version. Michael was a little confused; his module compiled without
modification against PHP 4, 5.0 and 5.1. Pierre replied that some of his modules
also compile without modification across those versions. The problem isn’t in
compiling the module, but in loading it, for which the extension’s module API number
needs to match that of the PHP version. He added that the internal changes would be
explained in the README.UPDATE_5_2 file.

Having just finished editing that file, I intervened to say that internals
changes wouldn’t, in fact, be mentioned there at all. The upgrade guide is intended
for PHP users rather than extension authors, and the module API changes – which were
extensive in PHP 5.2 – are in no way a user issue. Pierre claimed that Rasmus
Lerdorf had asked for internals changes to go into that file, which was news to me.
I reluctantly supplied a patch saying something about the new module API in terms
most users would understand; happily, despite Pierre’s assertion that this thread
justified its inclusion, my patch was never applied. Both the extension API changes
and the module API changes in PHP 5.2 would more usefully be documented in the file
put into CVS for that purpose, README.EXTENSIONS – but still haven’t been, at
the time of writing.

Philip Olson chose this moment to announce that the PHP Documentation Team have
decided to create migration pages for minor releases, and are intending to use the
distributed upgrade files as a basis for this. However, this still doesn’t help
extension authors particularly.

Short version: README.EXTENSIONS needs updating.
(Oops.)

NEW: PHP 5.2.0 RC4

Ilia Alshanetsky, as Release Master for the PHP 5.2 series, announced PHP 5.2.0
RC4 as follows:

An hour or so later, Windows maintainer Edin Kadribasic announced the
availability of the Windows binaries and, for the first time, John Mertic’s new
installer:

Short version: A moment of truth for John.

REQ: Filter release

Immediately following Ilia’s RC announcement, Pierre notified the internals list
that he would like to roll a bugfix-only release of the filter extension the next
day, if there were no objections. Although noting that Derick Rethans had said on
IRC that he wanted an API shake up for the extension, Pierre felt strongly that it
was too late in the PHP 5.2 release cycle to do this. Unless Derick could convince
Ilia to have a further release candidate, it would be dangerous to make radical
changes to the API now.

Somewhat predictably, Derick asked Pierre to hold off releasing anything for now.
He argued that having a consistent API takes priority over all other considerations,
and he was unwilling to introduce a new extension with a broken API. Pierre argued
that this mini-release would validate both his fixes and the current implementation.
Further, he felt that stability was even more important than consistency; when it
came to the filter extension, the team couldn’t afford to risk a broken release.
Derick should either work on convincing Ilia of the need for extra time, or wait for
a chance to make his changes in PECL.

Short version: Uh-oh.

TLK: Memleaks and virtual_file_ex

Nuno Lopes had found some memory leaks associated with
virtual_file_ex() usage. On investigation, he believed the problem was
that the TSRM function doesn’t free the new cwd on error. He wasn’t
sure whether the cleaning should be the responsibility of the function itself or of
the caller, but recommended that the convention – whichever it was – should be
defined, and the leaking areas patched accordingly. He followed up his analysis with
a Zend Engine patch to fix the first of the leaks, but mentioned in his
accompanying note that there might be other virtual_file_ex() issues
that he hadn’t fully tested.

Ilia, reviewing Nuno’s patch, wondered if there wasn’t a simpler fix for the leak
– but Dmitry Stogov later applied it as-is.

Short version: Walking on the edge.

RFC: Filter API proposal

Pierre felt that it was so important to get consensus over the filter API now
that the extension should be removed from PHP 5.2.0 if this proved impossible. He
therefore mailed a proposal for the API to internals@, asking for comments.

Pierre asked all to note that his proposal did not reflect the version currently
in CVS. It included throwing out filter_data and having
input_get() use INPUT_DATA in the same way as
input_get_args(). Also, input_get() would accept only the
filter type as an argument by default; if flags or options were needed, they and the
filter type should be passed to the function as an array. Finally, Pierre asked
respondents to focus on the API itself in their replies; missing filter types,
options or flags could be added at any point after the 5.2.0 release.

Christian Schneider was the first to respond. He liked input_get(),
but unfortunately got hung up on the existing documentation when it came to
input_get_args(). Andrei Zmievski was next. He put in a request for an
optional charset parameter in input_get(), pointing out
that it would be needed in PHP 6 anyway.

Dan Scott wanted some of the function names changed. He didn’t think
input_name_to_filter() was suggestive of a filter ID return value; it
just sounded as if a name would be sent into a filter.
input_filters_list() was self-explanatory, but he’d noticed that the
general structure of PHP function names tends to be
<prefix>_<verb>_<noun>; it would be more consistent
if this function were named input_list_filters(). Outside that, he was
happy with the proposal.

Ilia thought the names should be kept short to reduce the likelihood of typos,
given that the team expect ext/filter to be both widely adopted and
frequently used. Dan agreed, and suggested input_id_filter() as a
replacement for input_name_to_filter(). Pierre argued that missing the
‘to’ part would lead to confusion over the function’s role. Besides, the ID is
internal, numeric and defined by a constant‘; it would be less misleading to
have name_to_id() than id_to_filter(). That said, the
function in question would probably be used about once a month – it wasn’t that
important. Ilia felt that the functions should all be prefixed with
filter_ anyway, which would have the side benefit of allowing much
shorter names. input_list_filters(), for example, would become
filter_list(). Andi Gutmans backed Ilia, pointing out that this is in
fact the standard for function names in PHP extensions. Pierre promptly came up with
a list:

Is it ok?‘, he asked hopefully. Zeev Suraski and Dan agreed
that they liked the look of that. Ilia did too, mostly, but wanted to explore the
option of renaming filter_get_args() to
filter_get_variable() to make it consistent with
filter_has_variable(). Mike Wallner preferred
filter_get_var() and filter_has_var(), but Pierre pointed
out that there was in fact an ‘s‘ on the end of
filter_get_args() for a reason. He had a personal preference for
args over vars, in any case.

Short version: Talking of args – where’s Derick?

CVS: More about filter

  • Zend Engine bugs #38779 (engine crashes when require()‘ing file with
    syntax error through userspace stream wrapper) and #38772 (inconsistent overriding of
    methods in different visibility contexts) were fixed [Tony Dovgal, Dmitry
    Stogov]
  • The Zend Engine gained a new API function, is_zend_mm(), to allow
    runtime checks for the Zend memory manager. It also lost an old macro,
    USE_ZEND_ALLOC [Dmitry]
  • Streams bugs #38096
    (large timeout values ignored on 32-bit machines in
    stream_socket_accept() and stream_socket_client()) and
    #37779 (empty
    include_path leads to search for files inside /) were
    fixed [Ilia]
  • Feature request #37923
    (Display constant value in reflection::export) was implemented
    [Johannes Schlüter]
  • Network bug #38687
    (sockaddr local storage insufficient for all sock families) was fixed
    in PHP_5_1, PHP_5_2 and CVS HEAD [Sara]
  • In ext/gd, bug
    #38801
    – the result of a failed merge following an earlier bug fix – was fixed
    in the PHP_5_1 branch [Pierre]
  • FastCGI bug #38757
    (MultiPart form uploads fail with FastCGI) was fixed [Dmitry]
  • In ext/dom, bugs #38813 (DOMEntityReference->__construct crashes when
    called explicitly), #38823
    (DOMComment->appendData does nothing) and #38850
    (lookupNamespaceURI doesn’t return default namespace) were fixed
    [Rob]
  • In ext/curl, bug
    #38844
    (curl_easy_strerror() is defined only since cURL 7.12.0)
    was fixed [Tony]

Derick had meanwhile started looking at the existing filter API. He made it
possible to pass filter flags as LONG again, and put in a quick workaround to make
super globals filtering work in 5_2, noting as he did so that he’d need to ‘get
things sorted out’ before merging his changes to CVS HEAD. He then tracked down the
change that had broken the default filter for the request parameters to Tony’s fix
allowing multiple filters, and wrote him a note to say so. Tony simply asked how to
reproduce the problem. Pierre explained to Derick that Tony’s patch had fixed a
critical bug in which any single call to the filter functions could permanently
remove the original data; the ext/filter test 035 measured its success. He hadn’t
noticed the default filter breakage because it wasn’t possible to test it in certain
conditions. Tony helpfully named the certain conditions: the existence of any of the
environmental variables SERVER_SOFTWARE, SERVER_NAME,
GATEWAY_INTERFACE or REQUEST_METHOD would prevent a call
to php_getopt(). He followed up with a patch for
Derick
to check. Pierre confirmed that the patch allowed both test 035 to pass
and the default filter to work.

Short version: Tony saves the day.