Categories


Loading feed
Loading feed

Zend Weekly Summaries Issue #333


TLK: Anonymous functions
TLK: GSoC - dbobj [continued]
REQ: getpass
TLK: GSoC 07 [continued]
TLK: More GSoC 07...
BUG: IMAP/GSSAPI auth failure
BUG: String BC break
TLK: Dealing with the old stuff
CVS: LDAP maintenance era
PAT: Black box free zone

TLK: Anonymous functions

Wez Furlong posted a proposal and patch for anonymous functions in PHP. This, he wrote, would allow users to do things like:

$data = array("zoo", "orange", "car", "lemon", "apple");
usort($data, function($a, $b) { return strcmp($a, $b); });
var_dump($data); # data is sorted alphabetically

Wez felt strongly that pulling the behaviour into the core would be preferable to 'the travesty that is create_function()', and explained that the changes needed to do so are minor. His approach was to have the expression:

function() {}

evaluate to the (generated) name of the anonymous function, so that:

$foo = function() {};

would set $foo to a string such as __zend_anon_1, which can then be passed around internally as a callback name. This is similar to the way create_function() works, but with the advantage that 'you don't need to use crazy quoting' to declare complex functions. Wez noted that his current patch wasn't perfect (and in fact found other minor problems during testing). However, his real question was whether anonymous functions should be part of PHP at all.

Jan Lehnardt, Jim Wilson, David Zülke and Gwynne, Daughter of the Code (no really) all backed the proposal, prompting a request from Tony Dovgal for the enthusiasts to write tests for the patch. Stas Malyshev was less certain. He wondered what would happen with:

$data = array("zoo", "orange", "car", "lemon", "apple");
$rev = 1;
usort($data, function($a, $b) { return $rev?strcmp($a, $b):!strcmp($a, $b); });
var_dump($data);

Stas believed it would be quite hard to make that work in PHP, because $rev would be in the wrong scope. And how about:

$f = function($a, $b) { return $rev?strcmp($a, $b):!strcmp($a, $b); }

- would this maintain a correct value in $rev? Wez explained that 'doing anything fancy with scoping' would a) complicate the implementation and b) mean explicitly referencing the global scope in order to break out of the function scope. In summary, 'it would be cool if the lexical scope was inherited, but maybe not cool enough to warrant making it work'. Stas argued that the difference was that inheriting lexical scope would make it closure. Without that, he simply saw the patch as 'a nice way to save a couple of keystrokes', and warned that people coming to PHP from other languages might expect it to act like closure because of the way it looked.

Marcus Börger, Sebastian Bergmann, Jacob Santos and Christian Schneider backed the initial proposal. Christian added that bringing closures into PHP wasn't a good idea in his own opinion; he regarded them as having 'a high WTF factor', citing his own experience with closures and this in Javascript as a cautionary tale. That said, he wondered whether generating an object method rather than a function wouldn't resolve the scoping issue?

Jon Parise took the time to review the actual patch, and came up with a couple of suggestions for improving it.

Sean Coates couldn't see a way for PHP to offer full closure support, given that - unlike Javascript - there is no way in PHP to access the parent scope unless it happens to be the global scope. That said, he liked the syntax Wez proposed for anonymous function declaration, although he had some queries about how it would tokenize. If it went the way he believed it would - with the anonymous function tokenized as T_FUNCTION rather than as T_STRING - it should be possible to compile it at compile time rather than at runtime. Stas agreed that this is a major advantage of the proposal, since it allows the function to be cached.

Thinking it over, though, Sean recognized that create_function() is most often used to create dynamic functions, and sent the discussion back to scoping, noting that variables used by such functions aren't necessarily declared in global scope. Wez had been thinking about ways to get around this too, and suggested a keyword to mark variables used in dynamic functions so that they could be inherited from the lexical scope in which the function was defined:

$ver = phpversion();
$fancyVer = function () { lexical $ver; return "PHP $ver"; };

However, Wez - admitting unfamiliarity with Zend Engine internals - remained uncertain what would happen if the function were called after the hash table representing its scope had been destroyed. He wondered whether it could be solved by storing a reference to $ver when the function is bound, but wasn't sure of the implications of this and suspected that a realistic solution would be far more complex.

Sean liked the idea of using a keyword to grab scope. He went on to explain how closure works in Javascript; the user-defined function maintains access to the parent scope even after the parent would normally have been destroyed. That said, functions in PHP are fundamentally different to functions in Javascript, which are objects that have access to variables from all parent scopes... Stas intervened to point out that variables aren't interpreted by the compiler in any case. Wez suggested that the compiler could make a list of variable names to import and store them in the zend_function struct, allowing the variable reference to be treated in the same way (he believed) as a global variable. Stas explained that global references are actually created at runtime, and binding to scope couldn't work in this way. That said, adding binding capabilities to DECLARE_FUNCTION could work, although it wasn't clear what would happen in the case of a loop - which in itself would be difficult to deal with at compile time. Stas also wondered how variable values might be added to the function symbol table at runtime. Would they be references, or would they be copies?

In what had been a sub-thread, Robert Cummings asked whether it wouldn't be reasonable to assume that the relevant scope was the immediate parent of the function scope. Stas explained that, although the parent scope is known at compile time, functions are actually called during runtime; there is no way for PHP to 'know' the scope of variables passed to, say, usort() with regard to the scope of the anonymous function calling it. Besides, storing the function name as a variable could easily move the scope away from the original declaration. Robert wrote that he hadn't been arguing for the preservation of a variable value at the point of function creation; he simply wanted that value to reflect whatever is defined in the parent scope. Sean explained exactly how horrible this would be when it came to debugging somebody else's code, but admitted that a single layer of scope - something like $_PARENT - might be useful.

Richard Lynch held a torch for the introduction of metadata (__FILE__ and friends) for anonymous functions. Wez gently pointed out that his patch actually offers that.

Andi Gutmans suggested compiling the anonymous function itself at compile time with placeholders, i.e. the actual closure would be created at runtime. He proposed a new global variable, $_SCOPE['var'], that would reference the current $var during runtime. For example:

$var = php_version();
$fancyVer = function () { return "PHP $_SCOPE['var']"; };

Regular variables would be treated as they currently are in create_function(). Andi noted that 'fix-up time' - when $_SCOPE is populated - would be faster than compiling the code. Stas replied that this more or less was what Wez had already proposed, and went on to outline how it could be achieved. However, he wasn't certain that bringing full closure support to PHP would be a good thing, pointing out that it could lead to very messy code. Lukas Smith agreed; he felt that polluting the global namespace for something not intended for re-use would be a bad move, and suggested artificial limitations for closure usage. Andi pointed out that the namespace would remain polluted, limitations or no limitations. Wez intervened to say that the $_SCOPE idea wouldn't quite work out:

$funcs = array();
for (
$i = 0; $i < 10; $i++) {
    
$funcs[] = function() { return $_SCOPE['i']; };
}

assuming that $_SCOPE would take a copy of $i during fix-up. Since the function would only be compiled once, there would only be one place to store the variable values for $_SCOPE, and this would leave the result of calling any of the functions listed in $funcs undefined. Wez could see two ways to resolve this. The first was to generate a unique function name as an alias at each iteration through the loop and use that as a key to access the appropriate stored value(s) using something like get_scope_vars(). The second way was to have a first-class callable type that would store the information in the return value from the function declaration. It would also store a pointer to the op_array and a hash table used to initialize the local scope, based on the information in $_SCOPE. Wez had a personal preference for the second solution, but noted that it would require a lot of work. It would be less invasive to have a callable class type store the information.

Andi had ideas of his own, 'including some funky parameter passing games', but wanted to bring the discussion back to whether the feature was actually wanted in PHP in the first place, and if so, to what extent?

Andrei Zmievski, arriving late at the ball, supported the initial proposal but added that he'd been wanting a true first-class callable type in PHP for some time. Stas was less certain; he thought the only way to implement it would be to explicitly declare the imported variables. Besides, why would anyone need a callable type? Wez explained; the only other option would be to store closure information in the op_array, which wasn't a good idea because there would be no way to know when it could be freed - it would have to be stored for the lifetime of the request. Stas agreed that this was a big argument in its favour, but pointed out that a new type would add complexity to the language and tools; it would also require modifications wherever callbacks are used. He proposed a middle way: the callback type could be modified to accept array($object, $name, $arguments), where $arguments are the captured closed variables. That way, no changes would be required anywhere in the source that closure variables aren't needed. There'd still need to be a decision made over the scoping of closures created in a class context...

A couple of people - "boots" of Smarty fame and PHP user Jim Wilson - wrote that they didn't need closures in PHP anyway, just the anonymous function syntax Wez had proposed in the first place.

Short version: Closures are a different thing altogether.

TLK: GSoC - dbobj [continued]

There was much interest in the native ORM project that Ádám Bankó proposed for GSoC. He posted a link to his project homepage, noting that his test applications currently act as stand-ins for documentation. He also warned anyone downloading the code that the current version is so unstable that it will probably segfault; he'd only just finished modularizing the database layer. He would, however, like to receive crash reports - preferably with some debugging information.

Propel user Tony Bibbs agreed with Lukas that only selected bottleneck areas of PHP code should be ported to C. He felt the only thing worth considering would be a native extension capable of collecting the metadata required by an ORM. Lukas argued that DB abstraction of any kind should stay in userland. Getting it to work usually meant a bit of hackery, so putting it into C would create 'a maintenance nightmare'.

Stas took a couple of Ádám's assertions about the advantages of C with a pinch of salt. Ádám explained; there is, for example, no way for a PHP class to handle:

$myobject->i++;

where MyClass::i doesn't exist, and this should be relayed to some magic function like __set(). He also didn't know how well __get() and __set() behave with references. Using C would allow him to do things like cache the mapping configuration in memory, and it would mean he didn't need to worry about the performance cost of large structures.

Doctrine fan Guilherme Blanco, in the throes of an ORM implementation himself, posted a link to an IBM white paper about the persistence layer. He didn't see how a compiled ORM tool would have any great advantage over one written in PHP, but was all for the idea of having a bundled ORM tool. Ádám wrote that, among other things, having what amounts to a bundled base class written in C (but extensible in PHP) would help make it standard. Jacob Santos agreed that this would be the optimal solution, but Lukas argued again that it was a question of maintainability. A tool to reverse engineer a database schema, for example, really shouldn't be written in C because it needed a low entry barrier; users should be able to quickly fix anomalies on encountering a new or obscure RDBMS. Ádám agreed entirely on this point; he didn't consider database schema discovery a core ORM feature. In fact, his current implementation supplies a callback hook that allows a PHP script to pull out this data from a multitude of sources.

Andrey Hristov suggested writing a reference implementation in PHP and porting it to C as appropriate, citing Marcus' initial SPL implementation as a template for this. Lukas could see this working, as it would make it easier to figure out what should or shouldn't be written in C. Ádám was less certain - he didn't like the idea of doing everything twice over - but Jacob agreed that this approach enables speedy extension development. The important thing was to achieve community consensus over a standard API. Ádám, having put a lot of work into his project, asked list followers to take a look at his example scripts and decide whether his current API was or was not a good starting point. He would be willing to implement whatever API is agreed upon, and wrote the initial abstract for his GSoC application to reflect this flexibility.

Short version: All sounds promising. Fingers crossed.

REQ: getpass

Daniel Rozsnyo, working on a CLI script, wanted to allow the user to type a password without it being echoed back to the screen. He'd found the function he believed he needed for this in unistd.h - getpass() - but had found it marked obsolete in the manpage. Was there any way the getpass() function could be included in the next release of PHP, or should he patch his own copy? The only alternative he could see was to use a small binary and call it using the tick operator.

Sara Golemon thought it was a really bad idea to wrap an obsolete function as part of the PHP core, but wrote that supplying it as an extension would be a different matter - perhaps even as a PECL extension. That said, it would only work with the CGI or CLI SAPI, and not even then if output buffering came into the equation. Sara also thought it unlikely that getpass() integrates with PAM, meaning that it would fail alongside distributed authentication schemes like LDAP and kerberos. Overall, the idea wasn't a good one, full stop.

Edin Kadribasic suggested a workaround PHP function for *nix systems:

<?php

function get_password($prompt) {
    
$ostty = `stty -g`;
    
system("stty -echo -icanon min 1 time 0 2>/dev/null || " ."stty -echo cbreak");
    echo
"$prompt: ";

    
// Get rid of newline when reading stdin
    
$r = substr(fgets(STDIN), 0, -1);

    echo
" ";
    
system("stty $ostty");

    return
$r;
}

$p = get_password("Password");
echo
"Password entered: $p ";

?>

Daniel hadn't thought of offering his code as a PECL extension. He wrote to Sara explaining that he only really wanted getpass() as a safe password entry for CLI scripts - safe in the sense of no shell history, password visibility or storage. Despite the obsolescence, both the mysql and openssl client use getpass(), and there are probably other mainstream clients that use it too. Daniel wrote that he would take his code to PECL after the next cleanup - maybe not getpass() directly, but the version from apr_getpass.c, which has the advantage of fallback implementation.

Short version: Not as crazy as it seemed on first sight.

TLK: GSoC 07 [continued]

Tijnema had been reading the ideas on the PHP project GSoC planning list. The test writing idea would be the simplest, in his opinion... but he had some ideas of his own, too. He'd like to see support for handling audio and video files in PHP, to do music processing, or to create music streams directly from a website. Marcus introduced him to PECL, pointing out that strong C skills are definitely required for that kind of project, and recommending that he choose an idea from the existing list (read: we really, really want those tests). Tijnema's response was that he could only find pecl/oggvorbis.

Richard Lynch mentioned the useful but unmaintained pecl/id3, which he uses to splice ID3 tags onto the front of a mp3 stream on one of his sites. Tijnema became very excited and started listing the libraries that he'd like to see as part of a file format conversion extension. Still, he'd rather implement video support... Alexey Zakhlestin pointed him towards ffmpeg (GPL'd), but Tony Dovgal wrote that nobody sane would do audio encoding and video resizing in PHP. It would result in an impossibly slow page load. There are plenty of open source utilities for converting WAV to MP3 or AVI to OGG... transcoder, lame, oggenc, to name but a few. That said, he'd be happy to see a PECL extension capable of reading video files and grabbing screenshots, and the existing sound file extension in PECL has never been released.

Tijnema wrote seriously that his dreams for audio would extend the limits of PHP; hadn't Tony ever wanted to be a web DJ? As for video files, they are simply sets of frames; any PECL extension capable of reading them has already achievest the hardest part of the conversion process. Tony - who has never had any desire whatever to be a web DJ - wondered whether Tijnema really intended to create videos on the fly? Tijnema didn't see why not. Why not have a movie stream on your homepage? Tony explained gently that he'd need a Cray cluster to handle it if his homepage ever became popular. Vlad Bosinceanu backed Tony; PHP really isn't suited for massive processing tasks. Audio or video processing may be useful in CLI applications, but even there he saw no gain in interfacing highly complex and specialized tasks from PHP. Resizing videos for online use meant encoding the video - and any accompanying audio - to begin with and fiddling with various properties in the process. Even if there happens to be a single library suited to do all this, bringing it to PHP would mean exposing a very complicated API.

Robert Cummings found Tony's idea of general PHP usage very limited, and Richard Quadling backed him in this. It seems it's fairly standard to use the same PHP classes across Web, CLI and GUI environments. That said, he didn't think he'd want to do video encoding with PHP... though it would be nice sometimes to do things in PHP via an extension to an existing library, and he mentioned Delphi's JEDI project as a worthy template for a potential GSoC project. Tony showed him ext/dangerous, which does much the same thing.

Short version: Dreaming's okay, but not on the internals list maybe.

TLK: More GSoC 07...

GSoC 06 participant William Candillon wrote to internals@ saying that he'd like to spend this summer writing an Eclipse plugin for the phpAspect project he produced last year. Sadly, there was no immediate response.

One David Duong had seen the GSoC 07 announcement on php.net, and wrote in search of a mentor for a project he had in mind. HyperWiki would be a hypertext distribution system providing a minimal CMS, the 'gateway', which could be administrated by non-programmers. In response to a user's search request, the gateway would provide aggregated search results, possibly alongside a list of the other linked systems searched. The user could add, edit or delete entries on all linked systems.

Although noting that this project doesn't add anything to PHP itself, David intended to go ahead with his proposal on the grounds that it would provide a showcase application for PHP 5, and possibly for PHP 6.

Marcus wrote simply that the deadline for submitting proposals is looming.

Short version: All kinds of everything...

BUG: IMAP/GSSAPI auth failure

IMAP user Mustafa wrote asking whether there had been any movement on the IMAP/GSSAPI authorization issue reported some time ago. Michael Allen responded, with a possible solution for the problem using his company's PHP extension - assuming that Mustafa is on an Active Directory network. Mustafa replied that he's actually using MIT kerberos and OpenLDAP, and dovecot is the IMAP server. He has no problems with GSSAPI except with the PHP IMAP call, which doesn't try plain auth when GSSAPI fails. Mustafa added that ldap_sasl_auth() has no GSSAPI support either. Michael pointed out that ldap_sasl_bind() does in fact support GSSAPI binds with the kerberos mechanism - he even had an example script for this:

$ldap = ldap_connect($ldap_server);
if (
$ldap) {
    
ldap_set_option($ldap, LDAP_OPT_PROTOCOL_VERSION, 3);
    
ldap_set_option($ldap, LDAP_OPT_REFERRALS, 0);
    if (
ldap_sasl_bind($ldap)) {
        
$srch = ldap_search($ldap, 'DC=example,DC=com', "(cn=$cn)");
        if (
$srch) {
            
$info = ldap_get_entries($ldap, $srch);
            for (
$i = 0; $i < $info["count"]; $i++) {
                if (isset(
$info[$i]['distinguishedname'])) {
                    
$resp = 'Success: ' . $info[$i]['distinguishedname'][0];
                    break;
                }
         ...

Michael saw no reason Mustafa shouldn't be able to get this working using mod_auth_kerb with the option:

KrbSaveCredentials on

Although he'd noticed in the past that using KRB5_KTNAME to specify a keytab file from which to get credentials doesn't work. Mustafa thanked Michael for the example script and wrote that he'd test later and confirm his findings - but didn't.

Short version: Hard to tell without user feedback...

BUG: String BC break

Christian Schneider wrote to say that he'd found an apparently undocumented change in behaviour:

$a = "foo"; echo "\{$a}";

// PHP 4.4.4: {foo}
// PHP 5.2.1: \{foo}

He wasn't sure when the change had been introduced, but some third-party code using that construct had failed during a PHP 5 migration. Was it an intentional BC break, and if so, shouldn't it be documented in the manual?

Tomas Kuliavas replied saying that it had changed somewhere between PHP 5.1.0 and PHP 5.1.1. The manual page on string behaviour says that curly brackets are not escaped with a backslash, but escaping did work in older PHP versions. Tomas ended his mail with references to two closed bug reports, but Christian wrote that these bugs weren't quite the same as his. Moreover, in PHP 5.1.5 and PHP 5.2.1 the curly braces are escaped, it's just that the backslash is output too. If they weren't escaped at all he'd be seeing abc. Whatever, the PHP 4 way seemed to him to have the lowest WTF factor. Back to the original question: is this behaviour intentional, or is it a bug?

Short version: Hard to tell without developer feedback...

TLK: Dealing with the old stuff

Tony posted a proposal to change the severity of the error triggered in CVS HEAD when enabling magic_quotes or safe_mode, from E_ERROR to E_WARNING. Since E_ERROR is supposed to be used only for things that leave the Zend Engine in an unstable state, he believed it was misused here. Besides, it's impossible to give a filename and line number when an error is triggered by a php.ini directive. He planned to commit his patch later if there were no objections.

Alexey wrote that he thought it would be better to make users disable those directives manually, but Tony explained that this wasn't the issue. The problem was that the error messages refer to an unknown file.

Johannes Schlüter asked if he could completely remove get_magic_quotes_gpc() and similar functions from HEAD, rather than have them result in a fatal error. Derick Rethans backed his request, and also wrote that there should be a better error message for INI directives - something that would determine the configuration file in which the offending directive was set.

Short version: Thread hijack alert!

CVS: LDAP maintenance era

Changes in CVS that you should probably be aware of include:

  • Zend Engine bugs #40833 (Crash when using unset() on an ArrayAccess object retrieved via __get()), #40899 (memory leak when nesting list()) and #40883 (mysql_query() is allocating memory incorrectly) were fixed [Dmitry, Tony]
  • In ext/imap, bug #40854 (imap_mail_compose() creates an invalid terminator for multipart e-mails) was fixed in 5_2 branch [Ilia]
  • In ext/soap, bug #36226 (Inconsistent handling when passing nullable arrays) was fixed [Dmitry]
  • In ext/spl, bug #40872 (inconsistency in offsetSet, offsetExists treatment of string enclosed integers) was fixed [Marcus]
  • Issues with the long form of CLI options were fixed across all current branches of PHP [Marcus, Johannes]
  • Random crashes seen in ext/ldap should be a thing of the past from 5_2 up [Doug Goldstein]

In other CVS news, Jani Taskinen finally gave up the unequal battle to stay out of PHP development and started committing little bits and pieces once more.

Dmitry made some changes to the Zend Memory Manager 'to guarantee reasonable time for worst cases of best-fit free block searching algorithm'. (That'll be a speedup then.) He also worked on the SOAP extension during the week, and it's now possible to encode arrays using the SOAP-ENC:Array type rather than WSDL. You can activate this by using the option SOAP_USE_XSI_ARRAY_TYPE in your SoapClient or SoapServer constructor.

Wez Furlong figured a way out of the problem of local SQLite installs and the clashes they bring under Windows. He added a new DLL, php_pdo_sqlite_external.dll, to the build system, thereby allowing users to provide their own version of sqlite3.dll rather than the SQLite 3 library bundled in the PHP core.

Short version: Doug Goldstein takes responsibility for ext/ldap, long CLI options work, and SOAP-ENC:Array and php_pdo_sqlite_external.dll are born.

PAT: Black box free zone

Richard Quadling produced a patch to fix bug #33664 by preventing the DOS box from firing up under Windows when exec() is called from PHP CLI. There had been no comment at the time of writing.

Short version: Couldn't get much shorter.

Comments