Zend Weekly Summaries Issue #308

      Comments Off on Zend Weekly Summaries Issue #308

TLK: Filter API proposal [continued]
TLK: Internals newbies
TLK: Per-request Unicode
TLK: RSS bugs feed
TLK: Running a PHP script from an extension
BUG: Variable reuse
CVS: Unicode support for MySQLi
PAT: HashTable optimization

TLK: Filter API proposal [continued]

On with the heated debate.

As Derick Rethans ingenuously pointed out, the reason everyone had agreed to the
filter_ prefix in the function names was that this followed the coding
standards. The option to rename the extension ext/input was still open. He
still didn’t see how Pierre-Alain Joye could sensibly pass the filter itself as part
of an argument that also took an array of options and flags. They were two entirely
separate things. It wasn’t necessary to have a less flexible API just to separate
them. Derick also saw no good reason to force that parameter to be an array. If a
filter only takes flags, requiring users to write array('flags' =>
...)
is ‘annoying and unnecessary‘, in his view.

To Pierre’s argument that there’s no difference between filtering an existing
script variable and filtering input data, Derick retorted that there was a
conceptual difference that should be recognized, rather than having a single
function deal with both. As for dropping custom functions to filter script variables
who had requested dropping support for callbacks, and who had
approved it?

Derick’s objection to the array/scalar issue was that an array would be returned
even if the variable was a scalar‘. Data mangling simply was not acceptable;
the function should return FALSE if a scalar is passed to it, exactly
as it does when an array is submitted and a scalar expected.

Pierre was still upset over losing consensus through a single developer, although
in all fairness it should be noted that there was only discussion about function
naming on this thread until Derick’s arrival. If Derick wanted to drop ‘our
choices
‘, wrote Pierre, he should canvass for support from the rest of the
team.

Pointing out that input isn’t the only thing filtered by ext/filter,
Pierre – although still saying he didn’t care either way – accepted the original
choice of filter for the extension name, and thereby for its function prefix.
Passing the filter itself as part of an array meant that you could pass the same
array to filter_get() or as an element of a
filter_get_args() array. Derick, in Pierre’s view, only wanted to
change that array to mixed in the first place because he was expecting
flags to be the most common argument. This design was an arbitrary choice, based on
the current set of filters, and lacked logic. Pierre’s proposal didn’t have that
issue. Pierre also still didn’t see how there was a conceptual difference between
filtering an existing variable and filtering an input variable; the only thing that
was different about them was the data source. He wasn’t planning to drop support for
callbacks, just for filter_data… that had been badly expressed, and
he apologized for the ensuing confusion. Finally, why would anyone be expected a
scalar value if they’d passed an ARRAY flag?

Pierre concluded by saying that his patch was ready to commit, and he intended to
commit it. Regardless of the eventual outcome of the discussion, it would be more
straightforward to change the code after the patch had been applied than currently –
assuming an agreement was ever reached.

By Friday, Derick and Pierre were no closer to achieving consensus. Ilia
Alshanetsky, as Release Master for the PHP 5.2 series, needed to make a decision,
and asked for feedback from the developers and PHP users on internals@. The options
were:

Mike Robinson wanted the third of these options, but suggested that the
extension should be marked EXPERIMENTAL to make future API changes easier for the
team. That said, it depended on how long Ilia was prepared to wait for Derick and
Pierre to reach an agreement; obviously it would be better not to have any BC
breakage in the future… Richard Quadling and Kevin Waterson simply wanted to see
the extension stay in PHP 5.2, and didn’t care too much about the cost.

Derick wrote tartly to Ilia that he’d rather see him discussing the API than
calling for a vote on this. Pierre argued that Ilia had already given comments and
fixes, and had agreed to his own API proposal, as had everyone else…

Sebastian Bergmann felt it would be good to know who exactly has control over
ext/filter. Since the initial code was written by Rasmus Lerdorf and Derick,
he personally felt that they – and not Pierre – should have the final word when it
came to design decisions. Pierre wrote that this particular extension is so
important that it shouldn’t be left to one or two people to decide its future; the
whole team should be involved.

Tony Dovgal wanted to delay the PHP 5.2.0 release until the issue was resolved;
he added that he would like to see one more release candidate anyway, with a total
code freeze following it. Ilia confirmed that there would definitely be an RC5, but
following this he’d like to close the tree; that was why the filter issue needed an
immediate resolution. In that case, wrote Tony, he’d vote for going with whatever is
already in CVS and putting the onus on the extension developers to retain BC.

Short version: Happy days are here again.

TLK: Internals newbies

PHP user Stut had built an extension that uses a C++ library. It compiled okay,
but the build system insisted on linking with gcc despite him ‘telling it’ to use
g++. Was he missing something in his config.m4 script?


PHP_ARG_ENABLE(mmsr, whether to enable mmsr
support, [ --enable-mmsr=dir Enable mmsr support])

if test "$PHP_MMSR" != "no"; then
  PHP_ADD_LIBRARY_WITH_PATH(mmsr, "$PHP_MMSR")
  PHP_ADD_INCLUDE("$PHP_MMSR/MMSR")
  PHP_NEW_EXTENSION(mmsr, mmsr.c "$PHP_MMSR/MMSR/mmsr_bridge.cpp",
$ext_shared)
  PHP_REQUIRE_CXX()
fi

Stut had expected that PHP_REQUIRE_CXX() macro to trigger the
change, but it only seemed to influence the choice of compiler.

Tony pointed him towards pecl/rar for an example of a C++ extension
(actually there are a few, e.g. pecl/sdo) and suggested Stut used its
config.m4 as a template. Fellow extension author Olivier Hill thought
he had a solution, but ended by confessing to having the same issue with a C++
extension of his own. Still, he wrote, it works fine regardless. Sara Golemon simply
wrote:


PHP_NEW_EXTENSION(mmsr, mmsr.c
"$PHP_MMSR/MMSR/mmsr_bridge.cpp", $ext_shared,,,1)

Ref: aclocal.m4 / EEPHP:Chapter 17

Stut thanked everyone for responding, and explained that his order for Sara’s book hadn’t actually come through yet!

Next up was someone calling themselves ‘Prometheus Prometheus’ (real names seem
to be out of fashion just lately). S/he wanted to know whether it is possible to
pick up the return value of a function between the point where return
is called and the point where the value reaches the caller. This would be useful for
testing larger PHP applications; if it were possible to track the arguments (using
debug_back_trace()) and the return values, it would be possible
to automatically test different sets of input values for comparison. So, how exactly
is the return value stored, internally? Is there an existing userspace function that
s/he could use to retrieve return values?

Matt Sicker, happy to spot a question he already knew the answer to, replied
immediately that return values are stored in a zval pointer named
return_value. This is true of any function that returns a value using
the RETURN_* and RETVAL_* macros.

Derick gave a lengthier reply, explaining clearly that it is only possible to
pick up that value via an extension, i.e. not in userspace code, and there is no
core PHP function that does this. However, Xdebug does. Function and method names,
parameters and return values can be stored to a file on disk using the extension’s
auto_trace
facility
. If the user hoped to retrieve every variable set within a session,
s/he would need to reimplement the zend_execute() function within their
own extension to achieve it. Again, Derick referred to Xdebug, and suggested the
source code there as a good resource for this kind of experiment.

Short version: All things are possible.

TLK: Per-request Unicode

Andrei forwarded an email from Andi Gutmans announcing a patch Dmitry
had come up with allowing UG(unicode) to be configurable per-request.
As Andrei himself had requested, this implementation didn’t make extension code much
more complicated than it already is in CVS HEAD. Andi claimed it also should have
little impact on performance.

Andi went on to outline some of the reasons he saw per-request Unicode
configuration as essential. These included per-virtual host configuration, the
ability to host multiple applications on the same machine, and the ability to work
around a handful of BC quirks in the Unicode implementation from script level. He
asked the team to download and test on their own boxes, adding that unless there
were significant reasons not to include the patch, he felt strongly that it should
go in.

There was no immediate response on the internals list, and Andrei chased it up
the next day. Mike explained that he hadn’t had the opportunity to try it out yet.
Sara, who had, agreed that having per-dir support for the Unicode setting would be a
Good Thing, but felt that this particular patch would ‘make a soup of the
internal registries
‘. For that reason, she voted against its inclusion. Stas
Malyshev acknowledged that it would mean having two copies of the persistent system
hashes, i.e. the class and function tables etc, but wrote that he didn’t see this as
much of a soup.

Short version: Supporting the past can be very expensive
indeed.

TLK: RSS bugs feed

Andi wanted to know if there was an RSS feed in the bug tracking system? He felt
it would be helpful for maintainers trying to keep on top of bugs opened against
their area of PHP.

Philip Olson wrote that there was in fact an RSS feed already – it just wasn’t advertised.
He’d noticed, when checking this, that search.php and rss/search.php
take the same arguments; perhaps it would make sense to link the search results page
to the appropriate RSS feed? Andi felt that such a link would be extremely useful,
and asked if anyone had time to ‘hack it up‘. Hannes Magnusson did; he
implemented feed autodiscovery for both search.php and bug.php, but
felt that adding a link to the feed was redundant, ‘unless you are using a
browser from the stone age
‘. Pierre argued that a link would be more useful, and
Hannes was forced to recognize that not everybody configures their browser in the
same way he does. He duly added the link.

Sean Coates, coming late into the discussion, replied to Andi’s original post
with the news that the URL generated by an advanced search only needs /rss
adding to it to create a feed. ‘…or look at the bottom of the page where it
says “RSS feed”?
‘ wrote a slightly miffed Hannes. Ilia, who also hadn’t followed
the exchange in detail, wrote to tell Andi that he could in fact see an RSS link for
bug searches already, using Firefox or Safari. Hannes replied carefully,
….because I added it this morning, and the link at the bottom, “RSS
feed”
‘. Tony wrote helpfully that he could see the link using Opera too, and
Hannes gave up completely. Andi remembered him, though, and thanked him for his
help.

Short version: Speed is not always of the essence.

TLK: Running a PHP script from an extension

One Roland Schwingel wrote that he needed a portable way to call a PHP script
every n minutes – something like cron, but cross-platform. To this end,
he’d started work on a PECL extension to provide that service under PHP 5.1 and
Apache 2.2. The extension spawns a thread using Apache apr* functions,
and that thread is responsible for calling the PHP script. The code Roland had for
this was:


TSRMLS_FETCH();

zend_file_handle zfd;

zfd.type = ZEND_HANDLE_FILENAME;
zfd.filename = <absolute path to php script to execute>
zfd.free_filename = 0;
zfd.opened_path = NULL;

php_execute_script(&zfd TSRMLS_CC);

Somewhere along the line he must have missed something in PHP initialization;
when the function was called, PHP crashed during
php_stream_locate_url_wrapper() because the wrapper_hash
variable initialized at the beginning of the streams function had become a ‘bad’
value. Could anyone help?

Jasper Bryant-Greene observed that PHP might not be the correct tool for the
job.

Sara evidently thought so too – she wondered why Roland couldn’t simply use cron?
That said, she believed the problem was that the call wasn’t made in a request, and
probably not even in a unique thread context. It was possible to fix both issues by
creating a context at thread startup and wrapping a request activate/deactivate
either around the timer or just inside the thread context.

Roland explained again that he needed the extension to work cross-platform; he
reckoned ~85% of his applications were installed on Windows, and he wanted to keep
the setup simple. He hadn’t entirely understood Sara’s solution, and asked if she
could point him towards any example code. Richard Lynch thought Roland could
probably use the at command as a reliable cross-platform solution in
any case. If this proved unfeasible, it was probably possible to bundle an open
source, cross platform version of cron into the distribution process, which would be
a much simpler solution than ‘hacking up a bogus cron-like substance within
PHP
‘. Roland was ahead of him; he wrote that he’d tried the at
approach already, but wasn’t overly confident when it came to error logging. It had
also opened up too big a difference between the Windows and Unix handling of his
installation and maintenance processes… Richard Quadling pointed out that the two
are very different operating systems, and that any solution involving system
calls would need to be OS aware.

Short version: It’s possible, but complicated.

BUG: Variable reuse

Extension maintainer Frank Kromann wrote that he’d come across a problem in
ext/mssql under PHP 5.2. If he reused the same variable to store the result
of mssql_query() twice in succession, the second result was not always
correct. In other words,


$rs
mssql_query("select
1"
);
$rs = mssql_query("select 3 select 2 select
1"
);

did not work. He could work around this by calling
mssql_free_result($rs) between the calls; by calling
unset($rs) between the calls; or by storing the resource in two
different variables. However, the registered destructor was called in all four
situations. Why, then, did it matter whether or not the variable had been explicitly
unset prior to its reassignment?

Short version: Interesting question.

CVS: Unicode support for MySQLi

Changes in CVS that you should probably be aware of include:

  • In ext/zip, bug
    #38944
    (freshly created archive has no comment or cdir) was fixed
    [Pierre]
  • Core bug #38891
    (get_headers() do not work with curl-wrappers) was fixed [Ilia]
  • Zend Engine bugs #38942
    (Double old-style-ctor inheritance) and #38808 (“maybe ref” issue for current() and
    others) were fixed [Dmitry]
  • Core function metaphone() got itself both a 64-bit fix and a fix
    for bug #38961
    (metaphone() results in segmentation fault on NetBSD) [Ilia,
    Tony]
  • A potential open_basedir bypass in tempnam() was
    closed off in PHP_4_4, PHP_5_2 and CVS HEAD, fixing bug #38963 [Ilia]
  • Core bug #38859
    (parse_url() fails if passing @ in passwd) was also
    fixed across all current CVS branches [Ilia]
  • Core bug #38981 (using
    FTP URLs in get_headers() causes crash) was fixed in PHP_5_2 and CVS
    HEAD [Tony]

Adam Maccabee Trachtenberg had pulled up Rob Richards over his fix for DOM
bug #38949 (Cannot get
xmlns value attribute). He thought Rob should be using
getAttributeNS() to bind the xmlns prefix to http://www.w3.org/2000/xmlns/,
unless it was possible to use getAttribute() to pull out namespaced
attributes with prefixes? Rob responded that namespace declarations should have the
ability to be accessed in the same way as attributes, and he’d followed the
specifications regards handling them in DOM level 1 functions, which have no
namespace support. The only rule there for implementation when it came to prefixed
attributes was that set and get must work on the same
attribute. DOM level 2 functionality had not been affected by his changes. Tony
queried the lack of default action in setAttribute(), and reported
seeing a lot of warnings through that omission. Rob, taking his point, updated this
the following day.

Meanwhile Georg Richter heeded the call for Unicode updates in extensions and
provided them for his baby, ext/mysqli. Andrei Zmievski went through Georg’s
implementation with a fine-toothed comb and came back with a list of Unicode-only
macros that Georg evidently hadn’t come across. Once that initial list had been
utilized, Andrei came back with a few even less obvious Unicode-only macros. The
ever-stoical Georg implemented Andrei’s suggestions over the weekend.

Short version: A number of lesser security fixes went into 5_2 and HEAD
this week too [Ilia].

PAT: HashTable optimization

Patch king Matt Wilmas had noticed that almost every
zend_hash_init() call had nSize – representing the number
of elements to be added to the hash – set to 0. The way it works is
that, when there are more elements than will fit into the existing hash, more memory
is allocated to allow the table size to be doubled. The hash is then repopulated
from scratch. That initial setting meant that the table was almost always the
default minimum size; Matt thought it would probably speed things up a bit if the
actual number of elements were specified in zend_hash_init() where
possible, since it would save the rehash overhead. The benchmark results he’d seen
under PHP 5.2.0 RC4 varied from 14.2% to a whopping 80.4% with a large number of
elements; it seemed it was worth doing.

The actual change Matt proposed was simple; it involved changing a line in
_zval_copy_ctor_func() in zend_variables.c from


zend_hash_init(tmp_ht, 0, NULL, ZVAL_PTR_DTOR,
0);

to


zend_hash_init(tmp_ht,
original_ht->nNumOfElements, NULL, ZVAL_PTR_DTOR, 0);

Matt felt that some of the array functions – and functions that return arrays,
such as database fetches – could also be optimized in this way. Perhaps there should
be a macro, something like array_init_size(), that could be used where
the number of elements are easily determined? He’d be happy to hunt down areas that
could be updated in this way and modify them.

An impressed Nuno Lopes ran a quick check through the core PHP sources and came
up with a list of functions where the same optimization might be applied. They
included zend_default_exception_new_ex(),
convert_to_array(), zend_object_std_init(),
clone_wrapper_hash() and compiler_globals_ctor() – many of
which are used in extensions as well as in the Zend Engine and PHP core.

Short version: Effective research brings global benefits.