p. This is another entry in the series of interviews I did at OSCON 2006. This time, I was privileged to sit down with Andrei Zmievski and talk about Unicode, Yahoo and other PHP topics. Andrei is a native of Uzbekistan and came to the US to study when he was 16 years old. He now works at Yahoo on their Infrastructure team and is pursuing a Master’s degree in Linguistics.
p. The interview started with Andrei sitting patiently as I mangled his last name. Here’s how the rest of the interview went.
p. **Andrei, since you are so heavily involved in it, lets talk about Unicode in PHP 6. First, can you tell us why Unicode is important for PHP?**
Well, really, Unicode should have been in PHP five years ago. It’s too bad that there was no real effort or even desire to sit down and implement it. It would have made a lot of things easier for people who have to deal with building applications in either non-English/non-US locales or applications that have to provide internationalized data interfaces. Because it wasn’t an issue five years ago, there have been a lot of workarounds created since then; things like mbstring for example, that we have to get rid of now.
p. The real reason we are introducing Unicode support in PHP is that every high-level, web oriented language should have that built in. The reason is pretty clear. Processing internationalized text, multi-lingual text requires really good tools. They have to be on the language level to be really useful. The impetus behind that was mostly because Yahoo has a lot of properties that are international. We were going through the same issues other developers were going through where you have to create workarounds and different disparate libraries to support internationalized applications. Having that built into PHP itself would make things a lot simpler.
p. For my part, I wrote the initial proposal suggesting that Yahoo should contribute to that effort through my time and having me work on it. That’s how it started.
p. **Thank you for that background. Now let’s talk about the project itself. I think first on everyone’s mind is of course, how is the project coming along?**
It’s going pretty well. There was a bit of a lag earlier in the year because I was concentrating on some other things at Yahoo; so apparently was Zend. However, in the last couple of months, the effort has really picked up. We are moving forward at a good pace.
p. My goal is to have some sort of Unicode preview release of PHP 6 in the forth quarter of 2006. Hopefully the release will be in the middle of the quarter somewhere. Earlier would be greater. What that means is there is a list of extensions that will be ported to support the Unicode API. This will be twelve or fifteen of the first-tier extensions. That, combined with the code support for Unicode in the language will be enough for us to put out a PHP 6 preview release. This will give people something to work with and give us feedback on. It won’t be beta, it won’t even be alpha; it’s just the Unicode preview release.
p. **Great, I know everybody will be looking forward to the preview release to begin playing with Unicode and testing it. When you are implementing Unicode in PHP, what is the biggest obstacle you are finding?**
Really, it’s the fact that internationalization is not so much a technology as it is a mindset. You do have to educate yourself and others about it and what it really means. You have to throw away some preconceptions like “We’ve always done things this way so why can’t we continue doing that”. The answer is of course, the proper way to do it is a more difficult one most of the time. There has been resistance on some fronts against making things more in line with how internationalized support should be in PHP. But people are now accepting of that thinking and moving in that direction.
p. Really changing the mindset is the hardest part but it is going to happen. People understand that it’s an international world we live in and not a single country.
p. **Can you talk specifics with us when it comes to the conversion? How far along is it, what’s left to do?**
As far of the core PHP and the Zend Engine are concerned, I would say that things are probably ninety to ninety-five percent done there. The challenge at this point is to get those first tier extensions converted over. There’s been some good progress there, XMLReader has been converted and people are working on other extensions. We are also going through the list of standard PHP functions and upgrading those as well. I’ve been doing that on my own; hopefully others will join in.
p. Another part of that is documentation. Certain functions will change their behavior slightly because they are Unicode capable now. That needs to be documented and explained in the PHP manual. Additionally there needs to be an introduction to Unicode, what it means to PHP and how things will work in PHP 6. That hasn’t been written up yet but I think there is movement on the documentation front to do it.
p. **Let’s talk a bit about your work at Yahoo if we can. Can you tell us about some of the projects you work on there?**
At Yahoo, aside from working on PHP, I work with other Yahoo developers creating custom patches for Apache version 1.3 and 2.0. We work on adapting it to the needs of Yahoo.
p. **I know that Yahoo still runs mainly PHP 4. Do you have a rollout schedule for PHP 5?**
A lot of Yahoo properties are migrating to PHP 5 already. I don’t have the numbers here in front of me but we are moving to PHP 5.
p. **Ok, let’s turn our attention now to the future for a bit. Looking beyond PHP 6, what do you see in PHP’s future?**
You know, if I knew that I’d be betting on the stock market a lot more than I do now. PHP has always been very good at adapting to emerging technologies. It’s also very good at being the glue between those new technologies and the Web. So I’m sure that whatever new technologies are released in the next few years, PHP will be quick to adapt.
p. In the case of Unicode, it’s actually an old technology. As opposed to something like AJAX, Unicode has been around for ten or fifteen years. So it’s a little different there. We are adapting to an older technology instead of an emerging one. In the case of all the other technologies, I think PHP has done a great job so far. As for PHP 7, you can’t really say what it will be other than “what the users want it to be”. The users still drive a lot of the development through their requests and discussions.
p. **Along those same lines, is there any one piece that you think PHP is missing?**
Personally, from a language perspective, I think PHP needs support for functions as first class objects. Closures would also be good. Those are my personal choices. From a web technology perspective, I really don’t think it’s missing anything.
p. **Obviously, by employing yourself, Rasmus, Sara Goleman, Jeremy Johnstone and other high-profile members of the PHP community, Yahoo has committed itself to PHP. Why did they select PHP to be their glue language?**
I don’t know if you’ve seen “Michael Radwin’s presentation”:http://www.radwin.org/michael/blog/2005/10/php_at_yahoo_presentation_.html on that topic. If you haven’t, it’s available online. Basically, Yahoo had lot of internal technologies for developing front-end pages or sites. As with any technology, each of these required an investment in maintenance, training, developing tools, testing, etc. It got to the point where things were getting fragile because we had to maintain all of that. We began looking for something to replace it. We wanted something that had a lot of mindshare so we could easily hire developers that were already trained. We also wanted something that was actively being maintained and had a future so that Yahoo wouldn’t have to bear the burden of developing it alone. Finally, we wanted something that already had a proven track record. So we went through what was out there, did some testing and settled in on PHP.
p. **So, looking backwards a bit now if we may, how did you come into the PHP community?**
I started in PHP when I was working for a small web company developing an online publishing system. We were using a vendor’s proprietary language and database. We moved to PHP for the same basic reasons that Yahoo did, we wanted something that was being actively maintained and had richer functionality.
p. At the time Covalent was distributing Apache with PHP 2/FI. We looked at that and it looked pretty good but we didn’t do anything with it until about 6 months later when PHP 3 came out. That was a big improvement so we started studying it. Once we started migrating our applications to PHP I started noticing some things that were lacking in PHP itself. I took a look at the API and thought, ok, I can actually write the missing pieces myself. I started doing that internally first then as I was reading through the mailing list I saw that people were submitting patches so I decided to start helping out with things. I wrote the WDDX extension at first and then just kept on contributing things more and more.
p. **Wrapping up, let’s look at technology in general. What new technology is out or coming out that really excites you?**
My personal favorite, it’s just related to things I study and like, is computational linguistics and natural language processing. This can be applied to a variety of problems from search engines to query parsing, information retrieval, information inference, this type of thing. I think it’s going to be used more in the next two to three years than it has been so far.
p. **Finally, what is the one website or blog you have to read every day?**
Planet-php.org is a great aggregator for PHP blogs. However, I’m really interested in photography, there’s a site called “The Daily Dose of Imagery”:http://wvs.topleftpixel.com/ where this photographer guy posts an image every day. I really like that one.
p. As with everyone who gives me 30 minutes of their life, I’d like to say thank you to Andrei for taking the time to talk with me. I can’t wait for the PHP 6 Unicode preview release.