Box UK reducing congestion with blackfire.io

By Christophe Dujarric, on Nov 17, 2015

The following article as written by Box UK Developer Rob Rhoades (@studioromeo) during the #FireUpMyMac contest. Find the original here.

He managed to decrease a page load time from 9.36s to 0.1s, working out especially the time spent with I/O, down from 8.87s to ~0.03s. Discover how he found and solved the issue:

When writing a new application it can be easy to miss imperfections in the software’s flow. During the early stages these imperfections may have very little noticeable effect on the application and to a developer may appear invisible. But as the application grows so can the impact of these imperfections. This can range from slowing down a page to causing parts of the application to malfunction.

We have recently been working on an analytics engine using PHP and MongoDB. We chose MongoDB because of its ability to handle high data throughput which seemed ideal. We collect the data through a variety of data-sources, some automatic and some manual, and visualise them using the Google Chart API.

Our Challenge

During the initial building of the analytics application we were using quite a small dataset of approximately a few thousand documents; however as the data grew we noticed that some charts were taking much longer to render.

The most noticeable was our ‘trends view’ page which actually hit the server’s execution timeout, causing the chart to not render. This page aggregated statistics across a six-month period for each and every document stored but only displayed one type of statistic at a time. This was because both our X & Y axis were used so we opted for displaying our various statistic types in the navigation instead.

Profiling in PHP

Identifying bottlenecks can often be like looking for a needle in a haystack, especially as the code grows in size and complexity. Thankfully tools exist which can aid us in tracking down where potential bottlenecks are in the application.

Traditionally developers have turned to using a tool like xhprof to obtain the profile data and then a GUI to visualise it. While this works it can be time-consuming to set up, which is why we opted to use another tool called Blackfire, created by the folks at SensioLabs.

In a nutshell it’s like a hosted xhprof cranked up to 11. It includes interactive call graphs and metrics on CPU, memory and disk usage for each stage. One of our favorite features is the profile comparisons, which allow you to see the impact of any code changes.

Blackfire offers multiple ways to profile your application; in our case we used the web companion to test individual pages and the Blackfire executable to profile PHP scripts on the server.

Identifying the bottleneck

Initially after installing Blackfire on our development server we set about creating a baseline profile so we could investigate what the code was doing during the loading of the trends chart. This also served as a control for any further performance tweaks we made to the application.

As I mentioned earlier we split the trends view chart’s statistics across multiple pages. Inspecting the profile report from Blackfire we noticed that there was an excessive number of calls to the getMaxValue() method which queries the MongoDB server.

blackfire-bottleneck-anon

The method queries the database for the highest value statistic by asking the database to sort by the statistic in descending order and then grabbing the first result. The problem with this approach is it requires the database server to sort through every single record which can be slow if it’s reading the documents from disk.

/**
* Get the maximum value of a key
*
* @param  string $key
* @param  array $find
* @param  \DateTime $month
* @return mixed
*/
public function getMaxValue($key, $find, $month)
{
   $key = $month->format('Y_m') . '_' . $key;

   $value = $this->collection->find($find, array($key))
       ->sort(array($key => -1))
       ->limit(1)->getNext();

   return $value[$key];
}

We also noted that this method was being called 24 times per request when it should only be called a maximum of six times (once for each of the six months in the trend). The code was calculating the maximum value for not just the particular metric we were viewing, but also the other available metrics in the trend view. The view layer would then just throw this information away as it didn’t need it.

Our Solution

In order to fix this we changed our routes to specify the exact metric we were interested in, so we were no longer calculating the max value for each and every metric for all the months.

We also improved the indexing on the MongoDB collection by storing the highest values for each month in memory. We could then fetch this from the database which, due to it being in memory, was super fast.

$monthKeys = array();
foreach ($months as $month) {
   $monthKeys[] = $month->format('Y_m') . '_' . self::$sortedColumns[$keys[$metricId]];
   $or[] = array(
       $month->format('Y_m') => true
   );
}

foreach ($monthKeys as $monthKey) {
   $value = $this->collection->find(
       array(),
       array($monthKey)
   )
       ->sort(array($monthKey => -1))
       ->limit(1)
       ->getNext();

   if (isset($value[$monthKey]) && $value[$monthKey] > $max[$metricId]) {
       $max[$metricId] = $value[$monthKey];
   }
}

blackfire-before-anon

This means we were no longer crawling through the collection looking for the highest value as we could just pick it out of the index in memory.

blackfire-after-anon

As a result we ended up slashing the page load time from 9.36s to 0.1s! Most of this came from the reduction of time spent with I/O, down from 8.87s to ~0.03s. This is most likely due to storing the required data in memory.

Final thoughts

Over the next few weeks used Blackfire extensively to profile other parts of the application as well as test some low-level PHP scripts which dealt with a lot of data. Since we started profiling, our application has become noticeably snappier; we also have a much greater understanding of the flow so are better prepared to reduce any future imperfections that may cause bottlenecks.

Finding and solving this bottleneck without profiling would have been tricky – the only other way we could have investigated this was with a stepped debugger like xDebug but even that wouldn’t tell us how much time was spent on each part.

Blackfire was really helpful too because it was much easier to setup and use than xhprof and with its comparison feature we were able to see if we were heading in the right direction.

We would definitely use Blackfire again on another project, whether we needed to solve a particular performance problem or just identify the flow of the application.

Christophe Dujarric

Christophe is the Chief Product Officer at Blackfire. He's an engineer, but probably one of the least "tech" people in the company. He's wearing many hats, from product management to marketing and sales. He loves the beauty of simple solutions that solve actual problems.