parallel_for implementation for Project TBB (Intel) / HipHop (Facebook)
We tried to performance test the HipHop engine against Apache and then try to see how our changes to the HipHop engine influence the performance of WordPress.
We started out to performance test WordPress on Apache and HipHop. To get stats of the pure PHP performance we used a tool called tsung that is built for stress testing various protocols including HTTP.
Tsung can be configured to not fetch the entire site with all the static files like stylesheets and images but also to fetch only the PHP interpreted files. As we were not interested in performance numbers about static file serving we decided to only fetch PHP files with tsung.
Tsung can simulate a typical “viral attack” where a small number of concurrent connections start to build up load which then builds up to lots of concurrent connections and drops off to a normal base load. We configured tsung to run a maximum of 250 concurrent connections. The connections would be initiated in the simulator in three timed phases the times on the simulator are preset, the time that the server needs to answer all the queries will vary quite a bit as we can see later.
The performance numbers for the initial session connection times were slightly in favor of Apache but with growing numbers of simulated users the performance degraded quite dramatic whereas HipHop even in the just-in-time mode was very stable in terms of request times and resource consumption. The main problem with Apache was the resource consumption which simply maxed out our development servers.
The next step was to prove that our parallel_for implementation worked and actually increased the performance. In order to find a good place to hack WordPress and add our parallel_for to it we found wp_includes/plugin.php. In there is a function called apply_filters which had a for each loop that looked like it could benefit from parallel execution. apply_filters is used quite often in the WordPress code base so it seemed like a good point to get started.
We started off with the WordPress version of Hui Chen who has a very current patch set for running WordPress on HipHop and added our patches for the parallel_for. Our HipHop version is a fork off Hui Chen’s HipHop fork as well. We have a patch set for the parallel_for functionality that we have also pushed to our github account.
Running the benchmarks we can see that our parallel_for version uses less memory and generates less load on the server. It also had lower response times across all measured characteristics in tsung. The graphs also show that the overall time the test needed to complete our simulation was considerably longer for the version without parallel_for.
Lets read the numbers. The parallel_for version needed nearly 100MB less memory to run the test and was close to twice as fast. We moved a lot of work from PHP to the HipHop engine and measured that PHP would greatly benefit from a parallel_for function that runs native in C++. TBB looks like a perfect fit for those problems and our results show this quite nicely.
If we look at the impact this one change had onto the overall performance of WordPress on HipHop we can easily see how a wider implementation of parallel_for would benefit the performance on this stack.
the memory graph:
the CPU graph: