Splitting Zend Framework Using the Cloud

Recently, we released Zend Framework 2.5.0, which features a big change: we split the framework into separate components. Starting with this version, Zend Framework has become a metapackage container that aggregates the components via composer. If you go to the zendframework/zf2 repository, you will no longer see any code; the new repository contains a minimal structure with a composer.json resembling the following:

Thanks to this split, we can now focus more on single components, instead of a full package repository. This will facilitate the maintenance of the code, enabling diverse teams that can focus on specific code and promote the usage of components in different projects. We believe that the future of PHP frameworks will be more open to the usage of external libraries, instead of reinventing the wheel constantly. Extending on this concept, we even now have PSR-7 in place, which facilitates the usage of controllers and middleware across frameworks and applications.

In order to split the Zend Framework codebase into components, we used a script built by Matthew Weier O’Phinney, the team lead for the project. You can read all the technical details about this script in Matthew’s blog.

To give you an idea about the size of Zend Framework 2: it has ~27k commits, 67 releases, and over 700 contributors; a clean checkout is around 150MB. In contrast, a rewritten component repository, like zend-http, ended up with ~1.7k commits, 50 releases, ~160 contributors, and a clean checkout clocks in at 5.4MB!

Zend Framework 2 has about 50 components and for each component, the script requires about 5-8 hours of execution, using an Intel i5-2500 at 3.3Ghz with 8 GB RAM. That means that executing the full split on a single computer will require about 400 hours, more or less 17 days! While these could be run in parallel with tools like GNU parallel, realistically this was only able to cut the execution time to 150 hours, or 6 days, in the best-case scenario.

Thanks to the help of Corley srl, an Italian IT company, we proposed using cloud infrastructure to run the script in parallel, using 48 EC2 instances from Amazon Web Services (48 instances and not 50, because two components required a different approach).

With this cloud infrastructure, we executed the complete framework split in about 8 hours, instead of 400! From a cost perspective, we spent less than $25, thanks to a tool provided by Gianluca Arbezzano, Software Engineer at Corley, and thanks to the expertise of Corley in the usage of AWS.

The EC2 AWS infrastructure used

To accomplish this split, we used 48 EC2 t2.medium istances. Each one ran the component split script, created a new repository, and pushed the new git history to the repository. Every process generated a log file, which was pushed into an S3 repository with a relevant AWS policy.

The split of a repository requires a lot of disk access. A trick to reducing execution time is to use a RAM disk. This made the task very simple and efficient:

Now the /root directory uses tmpfs, writing directly in RAM.

The script to execute the split in parallel

In order to execute the script in parallel on all the EC2 instances, we used gianarb/zf-parallel-split, which provides a script that uses the AWS PHP SDK and few lines of bash.

The tool uses a PHP configuration file like the following:

We stored the list of Zend components to use in the components key.

This configuration file was consumed by a split.php script that created an EC2 instance for each component. We used the EC2Client class from the AWS SDK to create the EC2 instance with a simple loop:

The createTags option is not mandatory, but provided a nice benefit: we used it to add a name to the instance in order to watch the full process on the AWS Web Console:

Zend Framework split using AWS

runInstance is the command that starts a new instance; ami-d05e75b8 specifies an Ubuntu 14.04.2 instance. InstanceInitiatedShutdownBehavior forces termination after shutdown, so when the split has finished, the instance is removed.
UserData is a command that lets you run a script after the intance starts; in our case, we used the following script.sh:

As you can see, this script contains commands to install the environment (PHP5, CURL, GIT, AWSCLI, etc), clone the split script from github, execute the split, push the component result to github, copy the log file in an S3 bucket, and shutdown the EC2 instance (halt).

All EC2 instances were created and terminated by this script, and cloud usage was optimized, with no waste of time or money!

Conclusion

Thanks to the usage of a cloud infrastructure like AWS, we were able to execute a workload of 17 days in a matter of hours. The usage of cloud infrastructure for these use cases is matchless, and we encourage all PHP developers to consider it for similar applications. We also demonstrated that the design of a script to create an instance, execute the code, store the result, and destroy an instance is trivial, requiring only a few lines of PHP and bash.

The Zend Framework team thanks Gianluca Arbezzano for his tremendous work, and the entire team of Corley srl for sponsorship of this project.