The New PHP Indexer in Zend Studio 13.5

Indexers are fundamental for IDEs that provide features like code assist, validation and searching in the source code. The indexer rips the source code to basic lexical units like variables, function name, parameters, etc. and persists them in an organized data structure – the index. The indexed data is then queried whenever you invoke the code assist or request a search. Therefore, the performance of building and querying the index is crucial for the performance of the main IDE features.

For many years Zend Studio used a PHP indexer based on the H2 database. H2 is a fast and lightweight database and is suitable for embedding. It was a good fit for Zend Studio and served well for many years. However, the H2-based indexer was designed when the market was dominated by single-core CPUs. This is not the case today. When we tried to optimize the indexer for multi-core CPUs, we hit the limitations of the H2 database. Although H2 supports multi-threaded access, querying the database was still the bottleneck and we could not fully utilize the CPU. So, we had to look for another solution.

New PHP Indexer

Zend Studio 13.5 introduces a new PHP indexer based on Apache Lucene. Apache Lucene is a high-performance text search engine, designed to index large amounts of text data – like your PHP projects. Thanks to the superb indexing capabilities of the Lucene engine, the new release of Zend Studio provides a noticeable performance boost as well as better user experience overall. Your CPU is now fully utilized and you won’t experience any freezes in code assist if the index is still being built.

Download Zend Studio 13.5 Early Access

Below are the most significant improvements achieved.

Indexing

All operations related to creating, rebuilding and incrementally updating the PHP index are now significantly faster. The initial creation of the index takes 40% less time (almost 2 times faster), while recreating the index takes 70% less time (3 times faster). What’s more, the index takes significantly less space on the file system – 2-3 times less size than the H2 database.

Searching

The new indexing engine affects also the performance of different search tools available in Zend Studio. In general, the time of searching data in the index is comparable with the previous engine. However, there are some cases when the Lucene-based engine is much faster. This is especially true for those searches that end up with a big amount of matched results as the new engine is much faster in retrieving the matched data.

Semantic Analysis

As the semantic checks are heavily querying the index, significant performance improvement can be noticed here as well (especially for those checks that are marked as “slow check”). Thanks to the Lucene’s reliable multi-threading support and utilizing all the CPU cores, executing the semantic checks can be even several times faster.

User Experience

The above performance improvements lead to a better overall user experience:

  • Importing large projects takes less time for building the index and reporting the validation problems
  • Cleaning and validating PHP projects is much faster
  • Searching the PHP source code is noticeably faster, especially in the cases of having a lot of search results
  • Code assist is no longer blocked by indexer, proposals can be searched and provided even while the indexer is writing data to the index. Proposals are computed based on the current index “snapshot”.

Benchmarks

The table below compares the performance between the Lucene-based and the H2-based indexers for some basic use cases. The result for each of the use cases is the average time of five consecutive measurements.

Test environment:

  • Software: Zend Studio 13.5 EA (32-bit), Magento 2 project, Windows 10
  • Hardware: Core i5 CPU (4 cores @ 3.30GHz), 16 GB RAM, SSD Hard Drive
Use Case Time (s)
Lucene H2
 Building index (adding/importing project) 37.3 65.3
 Re-building index (cleaning project) 29.5 98.7
 Semantic analysis (default checks), single thread 50.2 46.7
 Semantic analysis (default checks), multiple threads (4 cores) 21.9 19.2
 Semantic analysis (all checks), single thread 289.9 380.5
 Semantic analysis (all checks), multiple threads (4 cores) 116.1 173.8
 Finding all type declarations 0.18 0.37
 Finding all function/method declarations 0.54 1.55

As you may have noticed, executing the default set of semantic checks using the Lucene-based index is slightly slower than the H2-based index. The reason is that these semantic checks are implemented in a way that they execute lots of queries that check if a lexical unit exists and most of the time there is no result found. Lucene is better in retrieving large results than finding nothing. We are now looking how to optimize these semantic checks.

Conclusion

Modernizing the PHP indexer leads to some spectacular improvements in the performance and the user experience of Zend Studio. We are thrilled by the initial results and the opportunities for further optimizations in the future versions of Zend Studio.

We urge you to download the Zend Studio 13.5 Early Access and give it a try!