Zend Weekly Summaries Issue #283

      Comments Off on Zend Weekly Summaries Issue #283

FIX: RETURN_RT_STRING
FIX: OS X and .dylib
TLK: Coverity
BUG: zend_highlight core dump
TLK: DB2 support
REQ: PDO_Firebird maintainer
CVS: Zend API bump averted
PAT: PCRE, ext/filter and LDAP

FIX: RETURN_RT_STRING

Still working her way through the code base for PHP 6, Sara Golemon noticed a
bunch of failed tests due to memory leaks and tracked down the cause. In Unicode
mode, the family of macros with the


RETURN_STRING(s, 0);

semantic become


RETURN_RT_STRING(s, 0);

with the binary value being translated to Unicode using a conversion function
called by the macro. The value of the original s was leaking memory
during the conversion process. She’d found some places in the source where this
issue had been handled, sort of:


    RETVAL_RT_STRING(s,
0);
    if (UG(unicode)) {
        efree(s);
    }
    return;
}

which is, as she said, perfectly valid in itself, but ‘a little
inconsistent
‘ with the standard RETURN_STRING()family semantics,
not to mention problematic for RETURN_RT_STRING() usage in general.

Sara offered up two solutions; either modify the macro definitions to call
efree() where there is a copy, or modify the conversion macro to accept
a flag that would trigger a call to efree(), and alter the
RETURN_RT_STRING() macros to support it. She asked for feedback.

Dmitry Stogov replied, saying that the first solution would fail because a string
passed to ZVAL_RETURN_RT_STRING() cannot use emalloc().
The second solution would work, but had Sara considered reusing the second argument
so that 2 could mean ‘duplicate and free’?

Sara wrote that she’d pasted the wrong version of the first solution into her
initial post; it should’ve read ‘modify the macro definitions where there is NOT a
copy’. The second parameter, duplicate, should only ever be set to
0 where the string is allocated memory with
emalloc(), ‘otherwise the Engine would get in trouble freeing
it later on
‘. Given this consideration, she offered a further solution; leave
the existing prototypes as they are and have them call efree() where
there is no copy. This would involve creating a new set of macros,


*_RT_STRINGL_NOFREE(s, len)

to be used when duplication is not needed but s should not be freed,
allowing edge cases to reuse s. Sara added that a RETURN
variant wouldn’t be needed in this case, and in fact would be guaranteed to leak in
Unicode mode. She and Andrei Zmievski had discussed Dmitry’s idea of reusing the
duplicate argument, but had rejected it on the grounds that it was
inconsistent with the existing macros; duplicate has always been a
binary value until now.

Dmitry pointed out that there is in fact a way to avoid Engine problems without
calling emalloc(). ‘You can use ZVAL_RT_STRING(&fname,
"strlen", 0)
, then call zend_call_function(&fname) and do
not destroy fname
‘. Sara, who hadn’t thought of that, followed
through with an updated list of options:

  1. Assume auto_free for RETURN_RT_STRING(s, 0)
    only
  2. Expand *_RT_STRING(s, 0) to include a new auto_free
    argument
  3. Overload the existing duplicate argument to include auto-free
    logic
  4. Duplicate the macros to one auto-free, and one non-auto-free version

Her own preference was for a combination of 1) and 2), but Dmitry pointed out
that 3) would require fewer changes, and demonstrated how it could work by drafting
two flags and rewriting the ZVAL_U_STRINGL() macro to use them:


#define S_DUPLICATE (1<<0)
#define S_AUTO_FREE (1<<1)

#define ZVAL_U_STRINGL(conv, z, s, l, flags)
if (UG(unicode)) {
    UErrorCode status = U_ZERO_ERROR;
    UChar *u_str;
    int u_len;
    zend_convert_to_unicode(conv, &u_str, &u_len, s,
l, &status);
    ZVAL_UNICODEL(z, u_str, u_len, 0);
    if (flags & S_AUTO_FREE) {
        efree(s);
    }
} else {
    char *__s=(s); int __l=l;
    Z_STRLEN_P(z) = __l;
    Z_STRVAL_P(z) = ((flags & S_DUPLICATE)?estrndup(__s,
__l):__s);
    Z_TYPE_P(z) = IS_STRING;
}

The advantage of the approach would be that only those calls that actually leak
would need to be altered. Sara agreed, but changed the prefix for the new constants
to ZSTR_ in the full patch she provided. Dmitry okayed the patch, and
asked her to update the *_ASCII_STRING() family in the same way; he’d
found that they suffered from the same problem.

Usage is as follows, with duplicate (now flags) having
the option of a new value, 2, to trigger automatic freeing of the
original string during Unicode conversion:


/* Uses s as-is for non-unicode, or frees s
after converting to unicode */
{
    char *s = estrdup("Hello");
    ZVAL_RT_STRING(pzv, s, ZSTR_AUTOFREE);
}

/* Duplicates for non-unicode, or converts (but doesn't free original) */
{
    char *s = "Hello";
    ZVAL_RT_STRING(pzv, s, ZSTR_DUPLICATE);
}
/* Uses as-is for non-unicode, or converts (but doesn't free original) */
{
    char *s = "Hello";
    zval zv;
    ZVAL_RT_STRING(&zv, s, 0);
}

/* Uses as-is for non-unicode, or converts (but doesn't free original) */
{
    char *s = "Hello";
    zval zv;
    ZVAL_RT_STRING(&zv, s, 0);
    /* use zv for some temporary purpose */
    /* It's now necessary to manually free this generated
value */
    if (UG(unicode)) {
        zval_dtor(&zv);
    }
}

Short version: A lot of deep thought went into this pluggage.
Appreciate it.

FIX: OS X and dylib

Christian ‘Chregu’ Stocker wrote to internals@ to report that iconv detection no
longer worked in the PHP_5_1 branch under Mac OS X, following an earlier M4 patch
from Andrei. He didn’t want to revert the patch because he figured Andrei must have
had some reason for making the change…?

Andrei responded, explaining that SHLIB_SUFFIX_NAME is used to
represent both .so (bundles) and .dylib (dynamic libraries) under OS X
during compilation. They should be treated separately; he’d considered introducing
an additional variable specifically for the platform, but had hit a snag at the
point where PHP_SHLIB_SUFFIX is registered; which variable should that
be?

For some reason Andrei and I were chatting that night, and I’d been up to my neck
in M4 code for the PHP-GTK project for the previous two weeks, so between us we came
up with a workable solution fairly quickly. Andrei posted the resulting patch [dead link] to
the list, commenting that it worked on his own Mac but he needed feedback from
people on other platforms before he felt confident enough to apply it. Chregu tried
it and found the typo, but reported that it worked otherwise. I posted the working
version to him off-list to double-check; he’d fixed a symptom rather than the cause.
Success! Andrei subsequently applied the patch in CVS, sans typo.

Later in the week, Jani Taskinen came home from Afghanistan on leave and went
ploughing through the entire PHP code base, muttering under his breath as he went.
Our brave new macro was one of the many things that got changed along the way.

Short version: Presumably it still works…?

TLK: Coverity

Nuno Lopes wrote to the internals list asking whether anyone else had
received the Coverity report‘?
He was hoping to help fix the remaining detected bugs reported by
the software on the site, but had had no response to his application.

Tony Dovgal replied that several people – himself included – have had access for
some time, and advised Nuno that applications can take a while to process. He added
pointedly, ‘s/bugs/pieces of code that Coverity doesn’t like/‘ – there were a
lot of false positives in among the initial 204 bugs reported for the PHP
source. Nuno wondered whether any real bugs had actually been found through the
Coverity scan, and Tony admitted to having found a couple of potential problems that
way. Marcus Börger, presumably catching up on this exchange, started labelling his
Coverity-related commit messages later in the week; but Nuno – and also Michael
Vergoz – were still waiting for their accounts by the end of it.

Short version: We’re still not sure how useful Coverity is at this
stage.

BUG: zend_highlight core dump

A fairly bemused Rasmus Lerdorf posted to internals@. He’d spent a full day
trying to track down a repeated crash under current PHP_5_1 on one of the servers
Yahoo! donated to php.net, and had run out of ideas. The crash was always on a
request to the
MySQL reference page
in the manual, and occurred during a call to
highlight_string() for one particular user note. He’d taken the
raw note data from
the backend files and attempted to reproduce the crash with it, but it was fine in
itself. ‘Somehow CG(op_array) is getting set to crap coming into
that call, but only sometimes. Yet if it is random memory corruption, why is the
crash always exactly the same?

He provided a
backtrace
, but went on to say that he’d had no luck when trying to reproduce the
crash under valgrind either. It could be 64-bit specific, or he might have somehow
set up the environment differently, or there could be some kind of re-entrancy
problem…

Later, Rasmus posted again to report that the most recent core, although still in
highlight_string(), had been sparked by the first user note on the
Brasilian Portuguese copy of the register_shutdown_function()
manual page
. Which rules out re-entrancy, at least.

Short version: Mystery crash in 5_1 branch, probably 64-bit
related.

TLK: DB2 support

IBM developer Kellen F Bombardier wrote to the
PECL development list
to complain that, although he’d been able to create the new
directory pecl/pdo_db2 without any difficulty, he was unable to commit any files
to it. Could someone please give him the karma to access the directory?

A furious Wez Furlong – known as ‘the King of PECL’ for the very good reason that
he generally controls what happens in the repository there – let Kellen feel the
long end of his whip. (He didn’t say it, but creating a directory for a new PECL
extension without mentioning its existence to the community beforehand is generally
considered quite rude.) Wez has been keeping PDO_ODBC – which has DB2 support –
‘clean IP’ from the start, in the hope that IBM would eventually contribute to it.
Dan Scott – who is no longer working for IBM – had given him reason to believe there
was a strong likelihood of this happening. Many of the developers working on the PHP
project are unhappy about not having access to the module, and of course Wez has
also had to support PDO_ODBC entirely unaided as well as taking the flak behind the
scenes from those unhappy developers. ‘And now you’re (IBM) going to release your
own driver anyway?
‘ he asked incredulously.

Wez went on to ask for a clarification of IBM’s position, pointing out that the
rest of php.net can’t contribute to IBM code without a CLA. He personally had been
keeping everyone away from his own code where it might become part of an IBM
project, because there is no clear process in place. If IBM planned to create a
further PDO_DB2 driver that the rest of php.net couldn’t contribute to, concluded
Wez, it was probably time that IBM kept their code in their own repository and
simply published their package releases via PECL. He could remove IBM code from the
PECL repository and the ACLs from the PDO_ODBC driver, and allow everyone else to
contribute once more.

Kellen replied that IBM’s position is based on their concern for a stable PDO
driver for DB2. They felt that DB2-specific nuances might not be catered for in a
non-specific driver. As far as the political aspect of the situation went, he had
started internal communications on the topic and the appropriate groups within IBM
would work to find a solution; he promised to keep the community updated over this.
In the meantime, Kellen had removed the PDO_DB2 project page and pledged to work
towards a more community-based approach to fulfill the company’s needs. However, at
this point he wasn’t certain whether this would be through PDO_ODBC, PDO_DB2, or
some entirely new module; he cited a couple of bugs he’d opened
against PDO_ODBC that had led to the perceived need for a specific DB2 driver.

Wez pointed out that ‘having two pieces of code that are > 90% the same,
maintained by two people that are not legally allowed to communicate code, doesn’t
help either piece of code
‘. Not only would there be a duplication of effort, but
the differences in implementation over time would be likely to lead to ever more
subtle problems. On the other hand, he had absolutely no problem over adding
conditional DB2 specific code to PDO_ODBC. Wez agreed that having a single code base
any single code base – the teams could collaborate on would be a step in
the right direction, but argued that the whole point of keeping the PDO_ODBC module
clear in the first place had been to allow precisely that. Noting (silently) that
both Kellen’s bug reports were against the PHP 5.1.2 release rather than a PHP_5_1
branch snapshot, Wez added that he believed he’d fixed related problems in PDO and
PDO_ODBC recently, and this later proved to be the case. He ended by reiterating
that there was absolutely no point in replicating code, before thanking Kellen for
his efforts on the political front.

Short version: When worlds collide…

REQ: PDO_Firebird maintainer

PEAR contributor Lorenzo Alberton wrote in to ask about the current status of the
PDO_Firebird module. He’d tested it a little, and believed it wasn’t on par with the
other PDO drivers. Was there some comparison table or roadmap he could refer to to
check on the features that had been implemented in individual PDO drivers? Was
Firebird/InterBase guru Ard Biesheuvel still active in PHP development?

Marcus confirmed that there is no such table; theoretically at least, all the PDO
drivers should support the full range of functionality once they were working, but
he hadn’t been able to generate a gcov report
for PDO_Firebird. The module appeared to have been
abandoned. He also hadn’t heard from Ard in a while, and put out a call for him on
the list.

Firebird user Lester Caine wrote that the lack of support for certain key
Firebird/Interbase features in PDO meant that the native driver would still need to
be used alongside any ‘limited PDO port‘. He implied that this was the reason
Ard hadn’t put any further effort into the driver, which isn’t entirely the case; it
was more that the aims for PDO in the early stages of the PHP 5.1 series were
irrelevant to Firebird. Moreover,
taking PDO_Firebird to the point where it becomes useful will involve a lot more
work than has been the case for any of the other PDO drivers, thanks to the unique
design goals in Firebird itself.

Lorenzo hoped Ard would still be interested in working on the PDO driver, given
that he’d made such a good job of the original interbase extension in PHP 5, but Wez
informed him that he was actively looking for someone to maintain PDO_Firebird. At
this point Lorenzo did two remarkable things. He posted ‘a call for help‘ on the
Firebird news page, and wrote a full online report regarding
the current status of the driver. If only everyone could be so pro-active.

Short version (thanks Lorenzo): The PDO_Firebird driver needs A LOT of love.

CVS: Zend API bump averted

Changes in CVS that you should probably be aware of include:

  • A new function, mb_check_encoding(), was added to the
    mbstring extension in the PHP_4_4 branch. The function is security-related;
    it provides a means of detecting a potential encoding attack. [Seiji
    Masugata]
  • Also in ext/mbstring, new functions mb_stristr() and
    mb_strrichr() were added – this time in CVS HEAD only [Seiji]
  • Functions using the new Zend API functions zend_object_std_init()
    and zend_object_std_dtor() were #ifdef‘d out in the
    tidy, simplexml, xmlwriter and xmlreader extensions to
    avoid the need for a Zend API number change in the PHP 5.1 series. (The check will
    be removed in PHP_5_2 branch.) [Tony]
  • In ext/spl, bug
    #36941
    (ArrayIterator does not clone itself) was fixed in CVS
    HEAD and PHP_5_1 [Marcus]
  • ext/mssql bug
    #33694
    (invalid SQL or timeouts makes it impossible to reuse persistent
    connections) was fixed across all active PHP branches [Frank Kromann]
  • Core bugs #36957
    (serialize() does not handle recursion) and #36875 (is_*()
    functions do not account for open_basedir) were fixed in HEAD and 5_1
    [Ilia Alshanetsky]
  • Zend Engine bug #36944
    (strncmp() & strncasecmp() do not return
    FALSE on negative string length) was fixed in HEAD and 5_1
    [Tony]
  • In ext/oci8, bug
    #36934
    (OCILob->read() doesn’t move internal pointer when
    reading 0’s) was fixed in HEAD and 5_1. Note: there is no support for Unicode
    CLOBs prior to Oracle 10. [Tony]
  • ext/spl bug
    #36981
    (SplFileObject->fgets() ignores
    max_length) was fixed in HEAD and 5_1 [Tony]
  • pdo_mysql bug #5827
    was fixed in 5_1 and HEAD, making the MySQL driver
    responsible for ‘gobbling up result sets in closeCursor() and the
    stmt destructor’ [Wez]
  • Also in ext/pdo_mysql, bugs #36602
    (persistent connections don’t work with MySQL 5.0.3+) and
    #6262
    (correctly fallback to emulated statements when the server version is too old)
    were fixed. Code was also included to default to using emulated prepared
    statements, but was not implemented due to the stage in the 5_1 release cycle
    [Wez]
  • A new general attribute, PDO_ATTR_EMULATE_PREPARES, was added to
    ext/pdo in HEAD and 5_1, replacing the custom attributes in the MySQL and
    Postgres drivers [Wez]
  • PHP 5.1.3 RC3 was rolled [Ilia]

Short version: More about RC3 next week.

PAT: PCRE, ext/filter and LDAP

Nuno had been looking into ext/pcre bug #33151,
and wrote to Andrei – as maintainer of the PCRE extension – to suggest that the
MATCH_LIMIT and MATCH_LIMIT_RECURSION parameters should
be tuned ‘to more realistic limits‘. He could see several approaches to this,
but Andrei replied that he’d been considering simply adding a couple of new .ini
settings to set them with. Nuno promptly posted
a patch [dead link]
to provide these, and wrote that he also felt there should be an
E_WARNING thrown when pcre_exec() fails. Derick Rethans
argued that this should only happen if the regular expression itself was the only
possible cause of failure, but Andrei pointed out that the subject string and
pattern work hand in hand; a ‘bad’ pattern might work perfectly well on one string
but fail on another. Nuno’s point was that currently there is no way to report the
error returned by PCRE. Perhaps every error returned by
pcre_exec() could be influenced by the subject string, but even having
an E_NOTICE would be helpful when it came to debugging. Looking into
it, Nuno went on to report that none of the potential return values could be
triggered by a ‘malicious’ string, with the exception of
PCRE_ERROR_BADUTF8 – and that, he argued, should have been validated
anyway.

Pierre-Alain Joye, meanwhile, had been looking into pecl/filter, and had
noticed that many of the extension’s test failures were related to the
FILTER_VALIDATE_REGEXP mode. He’d found that the mode uses PCRE
natively and is incompatible with ext/pcre, which ‘makes its usage a bit
tricky
‘. Pierre supplied a
patch
to make the mode use the ext/pcre API, commenting
that it would be better if every addition to ext/filter were to use it.

Ignacio Arenaza – evidently unaware of the extensive existing patches for the LDAP
module – posted his modifications
allowing it to use pagedResults. He’d tested his patch with W2003 AD
and OpenLDAP 2.1.x servers, using a Linux-based PHP client, and posted diffs against
all current PHP branches (and also PHP_4_3). We still don’t have a maintainer for
the LDAP module at present, which is why the existing patches are still sitting in
PAT; it would be good to find one, or at least someone equipped to fully test and
evaluate these offerings.

Finally, Ants Aasma’s patch introducing a new ext/gmp function,
gmp_nextprime(), was applied by Tony in CVS HEAD this week.

Short version: The LDAP extension urgently needs a maintainer.