30 Minutes with Jeremy Johnstone

      1 Comment on 30 Minutes with Jeremy Johnstone

p. jeremy_johnstone I’ll have to admit that when I sat down to interview “Jeremy Johnstone”:http://www.jeremyjohnstone.com/blog/ at OSCON I didn’t know who he was or anything about him. Most of the other interviews I at least knew who they were and what they had done with/for PHP. I had nothing on Jeremy, all I knew was that he was speaking at OSCON and he worked at Yahoo. As it turns out, it was a very interesting interview. Not for the normal reasons though. If you want more, you’ll have to come inside and read the interview.

p. **Jeremy, let’s start off by talking about your experience. Specifically, how did you get into programming and from there, how did you get into PHP?**
Basically, I got into programming somewhere around twenty years ago. I actually started programming with BASIC working on an Apple IIe. I’ve always been good at math and problem solving and both of those came into play when I started programming.

p. I started using PHP roughly around 1999. I personally don’t like Perl just because there are some idiosyncrasies with the language. Even the most expert Perl programmer would have a difficult time reading someone else’s Perl code just because there are so many different ways of doing things. I was looking for something else and PHP seemed to be a good fit at the time.

p. **Let’s talk about your work at Yahoo for a bit. I know you have been working on porting the Address Book application to PHP. Tell us about that experience and what else you are working on. Also, what are the major challenges you are facing, working on an application of this scale?**
Online address books is a very interesting topic. Despite what some may think who haven’t worked on writing one, it is a complex and varied topic which has a lot of gotchas. One of the big challenges porting it to PHP was the scale at which Address Book actually is at Yahoo!. The funny thing about it is that many people may not even realize they have ever used it, yet they do every day. Each time you go to Yahoo! Mail and send an email, the autocomplete that happens in the To: field is done by AddressBook. Also, Yahoo! messenger uses AddressBook to store the personalized details of a contact (like their display name if it’s not their Yahoo ID). And that’s just the tip of the iceberg as far as integrations go. Being that Yahoo! Mail is by far the #1 email provider in the world, that provides a large responsibility for us to fulfill.

p. **How did you handle making the PHP version of Address Book scale to meet these needs?**
We did a lot of heavy duty analysis to find out where our bottlenecks were, where we weren’t getting good performance out of the code, and where we were the code used excessive memory. The volume of transactions we were experiencing made it important that we squeeze every bit of performance out of the code we could. In some cases the changes we were making would only result in a three millisecond savings. Given the level of traffic we were experiencing, three milliseconds adds up to a huge savings. So getting everything as optimized as possible has been a very big challenge for us.

p. **What tools do you use to profile your code for optimizing?**
We use several different tools. A lot of times we are using valgrind and gdb to trace through the code because they give you the granularity that we are looking for and allow us to actually do the executing tracing. We also use Xdebug as well because it gives us a great “userland” perspective of what is going on inside PHP. We also use a number of tools that are internal to Yahoo as well.

p. **Ok, let’s talk about versions for the moment. Has Yahoo moved to PHP 5 or is most of your development still done in PHP 4?**
We are actually using PHP 5 now for most new development. I don’t know the exact numbers but I do know that almost all new properties are built on PHP 5 and all existing properties are being migrated over in the near future.

p. **What is it about PHP 5 that is driving Yahoo to move everything to it?**
There are a number of different features that have driven the migration. The advanced functionality in PHP 5, especially with the object model has been very useful. There are also improved performance characteristics and memory management been good for us. We skipped the PHP 5.0 release and moved straight into PHP 5.1. There were some major changes between the two that were very beneficial for us.

p. **Let’s talk about extensions to PHP. Can you tell us about some of the extensions you’ve written?**
Sure. Internally for Yahoo, I’ve written different back-end layers that wrap external c++ libraries that we use internally in the company. Some of these are things like integration points with Yahoo mail, our user database, and Yahoo messenger. These were all project that nobody had gotten around to yet so I went ahead and wrote them.

p. Externally, I wrote an extension that does text to speech for PHP. That way you can actually pass it a string and it will create a wave file of the audio.

p. One extension that I’m still working on is an image detection/processing application. It wraps an Intel library. To use it, you pass it an image of say five people. It will process the image and return you a list of the bounding boxes for the five people’s faces.

p. Other than Address Book, what other projects are you working on at Yahoo?
I work on a variety of projects at Yahoo. Right now I’m leading the charge on a disaster recovery project so that the next time we have a disaster like Katrina, we’ll be better prepared and be in a position to better help people. We did a great job last year, I helped with a lot of the effort involving that but I would have liked to see even more happen in that area.

p. !>http://static.flickr.com/31/61373837_eac4b2f63f_m_d.jpg! **When you say “Disaster Recovery” You are not talking about “Business Continuity” disaster recovery. Can you explain a little more about the type of Disaster Recovery you are talking about?**
Let me give you an example from Katrina to try and explain it. One of the biggest problems that people faced after Katrina hit was being able to reconnect with their loved ones. There were hundreds of different sites out there and none of them talked to each other. Some of the issues we faced were that each site had its own data format. Some sites were collecting data that they were not supposed to because of privacy concerns. Some sites would ask for your social security number so they could track you. These types of issues made it difficult for people to know which sites to use to try and reconnect. One of the things I’ve done over the last several months is actually write a data spec and now I’m getting the different players to rally behind it. There are a number of companies like IBM, America Red Cross, Google, even Microsoft, who are involved in some way and hopefully will officially support the standard once it’s complete. There are also a lot of independent NGOs and non-profits that are getting involved with it.

p. Basically what we are building there is a “humanitarian data spec”. The idea is the next time we have a disaster, each of the organizations can set up their own sites but we provide them with the list of fields to ask for and the XML messages so that they can submit the data up to our server and search the data that everyone is submitting.

p. **So you are building the backend infrastructure for these companies but they can still use their front end to talk to it.**
So to speak, yes. One of the biggest things is that people need to be able to have their own sites. Organizations like the various Baptist Church organizations want to have their own site and have their constituency visiting their site to find loved ones. The problem with that is you then have all these disparate islands of information. Having a central repository for this information – which Yahoo is in the process of developing – plus having data exchange standards that everyone agrees to use allows you to bring all the islands together. The net effect is no matter which site you go to, Microsoft, The Red Cross or Yahoo, you are going to get the same information.

p. **Has this specification been published publicly?**
Yes. The specification, while it’s still in it’s initial draft form. When it is ready for publication, I’ll post information about it on my blog “http://www.jeremyjohnstone.com/blog”:http://www.jeremyjohnstone.com/blog.

p. Along those same lines, myself and another Yahoo employee named James Jones, actually spent a lot of time down in the Houston Astrodome, along with 30 other Yahoo volunteers. Since then we have met with the Department of Homeland Security, the Federal Emergency Management Agency, and even representatives from the White House to find out where we could help service these needs better. Basically, we came away from those meetings with the idea that the number one priority was family tracing. People need to be able to find their loved ones.

p. Another problem we are trying to address is the flow of information between shelters. There were so many shelters out there that not even the DHS and FEMA knew where all the shelters were. So having a central database of all the various shelters is critical.

p. Another issue we are looking at is the issue of donations. The Red Cross received so many “in-kind” and monetary donations that they couldn’t process them all. They turned away a large number of the in-kind donations they were offered because they just didn’t have the facilities to handle it. If all of those in-kind donations had been placed in a searchable database, all these smaller organizations that people may not have heard of could request those donations and put them to use. We are talking about things like cots, beds food, clothing, all the things that were in short supply, but we didn’t have the infrastructure in place to manage them. The Red Cross just didn’t have the facilities to take in one million cans of canned goods but yet smaller organizations would have been more than happy to process the donations and see that they were distributed.

p. **When you were down there in the Astrodome, were you working on the technical infrastructure?**
Yes, I was mainly focused on the technical infrastructure. I also met with a lot of the people who were displaced trying to help them find their love ones. It was that work that gave me a real feel for the problem at hand and helped me identify exactly what the problem was. I actually sat down the Sunday morning before Labor Day and thought “there is a problem here”. I can’t search through twenty thirty different sites to help these people find there loved ones and still service all the people we had. I actually talked with David Filo throughout the holiday weekend and he was very instrumental in helping make things happen. I can’t thank him enough because without him a lot of the things I did that weekend wouldn’t have been possible. It turns out he was already working on a “scraper” that quickly went through and scraped all the sites. It then loaded all the data into one database so we only had to go to one place to view it.

p. **So in the time of a disaster, as shelters are opened, they would register with your service? As people enter the shelter they would register with their name and a picture so that their families can find them?**
Yes, that’s the basic idea.

p. **So when a shelter is opened, how do you ensure that they have the proper equipment to participate in this system?**
That’s always an issue. The Red Cross, through the Coordinated Assistance Network (C.A.N.) will help distribute some of the equipment necessary to the smaller organizations. Yahoo doesn’t have specific plans yet on how we are going to help on the infrastructure in the field. It’s not an area that we work in. We are working to build partnerships with people who can handle that.

p. **Is this an official Yahoo sponsored project?**
Yes it is. Pieces of it aren’t fully public yet but it is something that Yahoo has been dedicated to for quite a while.

p. **Normally, I ask people a question like “What are you passionate about” but it’s obvious from talking to you that this is something you are passionate about.**
There are a lot of things that I’m passionate about but this is something I feel I can make a real difference in the world with. I saw first-hand how much of a difference I was able to make during Katrina. Being able to continue that is just great.

p. **Wow, that’s a great project and you and Yahoo need to be commended for undertaking it. I wish we could sit here and talks some more about it. However, we need to get back to PHP and start wrapping things up. Let’s talk a little more about your extensions. You mentioned that you work on several in-house extensions. Do you or Yahoo have any plans on releasing these extensions?**
Actually, one of the extensions that I wrote was the text-to-speech extension. I actually wrote it on my own time but within the company, I wrote it, I had some ideas for its uses but nobody was too interested. So I asked if I could release it out into the world. I went through the process and was allowed to release it.

p. **Ok, let’s talk about PHP 6 for a moment. What is the one new feature that is coming down the pipe in PHP 6 that really excites you?**
I’d have to say that hands-down; the most exciting thing is the Unicode support that members of the core team, Andrei and others, are working on. One of the big challenges that we have with Address book – in addition to the scale issue – is that we release in a huge amount of markets with multiple languages in some markets. One of the big issues we’ve had in the past with Address book is that we were actually storing information in the native encoding, however, we didn’t always store what that encoding was. So we’ve had massive headaches with character encoding.

p. Moving forward, we are actually transferring everything into a UTF8 datastore. We will also be using UTF8 on the front end as well. Handling these character encoding issues with PHP 5.1 is a bit of a challenge, but thankfully I was able to snag code from PHP 6 to make that less painful. I try not to reinvent the wheel as much as possible, so being able to backport things like ICU transcoding or string comparisons from PHP6 was a major time saver.

p. **Ok, let’s wrap up with a couple of questions that I ask everybody. First, what new technology on the horizon really gets you excited?**
The one technology that gets me excited has, unfortunately, been on the horizon for a long time now and it’s still not here. I want broadband access to the Internet, anywhere on the planet. If you know anything about the geography of Silicon Valley you know that there’s a lake here called “Coyote Lake”. I want to be able to sit on the shore of Coyote lake with my laptop and be able to have multi-megabit speeds. Several providers have made progress of late, Sprint and Verizon’s EV-DO, Cingular’s UMTS but it still isn’t quite there.

p. **Is there a company out there that when you learned they were using PHP, it surprised you?**
Yes, I have a buddy of mine who works at Panasonic. He’s been working for the past year on an in-flight computer system that gives passengers things like touch-screen access to do things like browse the web, watch videos, etc. It’s actually written using the entire LAMP stack and a custom version of FireFox.

p. **Finally, what one blog or website do you read on a daily basis?**
It would have to be a tie between engadget.com and Jeremy Zawodny’s blog.

p. I’d like to thank Jeremy for the time he gave to do this interview and give a special thanks to Yahoo for their support of PHP. It’s obvious after talking with several people that work there that Yahoo pays more than just lip service to the concept of giving back to the community. They also understand that giving back doesn’t always mean contributing code.

p. =C=