Understanding continuous profiling: part 3

Our journey behind the observability scenes ends with continuous profiling, the intermittent data collection of which offers a unique tradeoff between the quality of the information available and minimal overhead.

Share this:
Click to share on Twitter (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Facebook (Opens in new window)
Click to share on LinkedIn (Opens in new window)
More
Click to email a link to a friend (Opens in new window)
Click to share on Pinterest (Opens in new window)
Click to share on Pocket (Opens in new window)
Click to share on Tumblr (Opens in new window)

By Thomas di Luccio, on Jun 19, 2024

We embarked on a journey to explore the different dynamics of the observability tools and features. While our goal was to grasp the specifics of continuous profiling, we also wanted to tighten our grasp on the notions of deterministic and probabilistic observability as those might not be the most self-explanatory terms.

We have seen, and experienced, that deterministic data collection initiates a selection of activities (HTTP requests or CLI commands) and leads to the observation of resource consumption for different dimensions of each function and service call in an application, from the very first to the last.

The fact that the entirety of a request or script’s activity is scrutinized allows the computation of reliable metrics and comparisons between two samples, as well as to user-defined expectations.

Probabilistic and deterministic observability

Exploring Blackfire monitoring helps us highlight the stakes of probabilistic observability since it is both deterministic and probabilistic. The information quality of probabilistic systems is directly correlated with the representativeness of the samples taken. A low sampling rate will lead to partial and biased measurements.

Therefore, correctly defining those sampling rates is key. More broadly, it’s about better understanding and controlling the portions of your applications and traffic that need to be observed using each layer of observability available (monitoring trace, extended trace, profile).

One of the most critical questions when setting up an observability strategy is the ratio between the information available and the overhead caused by data collection. We need to ensure that we are only collecting valuable data and enriching it with the most actionable information possible.

Critical information vs overhead ratio

This is also what makes the Blackfire observability solution so unique. We looked for all the possible ways to extract actionable insights and valuable information from the collected data. We are siding with developers in controlling their overhead.

The strict correlation between the amount of data collected and the overhead is making it a burning issue for observability. A profile represents the maximum amount of data possibly collected and therefore causes the maximum overhead.

Yet, in most cases, only the developer triggering that profile is affected by the overhead. A monitoring trace, representing the minimum possible amount of data collected, causes a very minimal overhead.

Having the best of both worlds? Intermittent data collection

One possible way of circumventing the information vs overhead antagonism is to collect data intermittently. This is the logic behind continuous profiling. It aims at collecting as much data as possible for multiple dimensions, but only at a defined frequency.

On one hand, the deterministic profiler is activated on demand and then continuously collects data for each function and service call. On the other, the continuous profiler is continuously up, watching the activity on all threads at a certain frequency. If a function or service is being executed when the clock ticks, then information is collected such as the current contribution to all the activated dimensions and the call stack.

Understanding continuous profiling

This leads us to uncover some important aspects of continuous profiling. First, continuous profiling is holistic in the sense that, once enabled, it watches all activities on the server for this runtime. It does not rely on a mechanism to target specific requests or CLI commands as the deterministic profiler does.

Secondly, we can infer that each sample contains partial information and therefore, can’t be compared between them. As a consequence, metrics can’t be defined and evaluated. A performance test stack cannot be derived from the continuous profiler.

It is also possible to have some functions undetected by the continuous profiler, functions entirely executed between two ticks will remain untraced and hidden. Yet, the probability of observing those functions increases with the number of samples being taken.

Therefore, we can assess that continuous profiling is entirely probabilistic. The quality of the information obtained depends entirely on how representative the samples taken are. Since it is holistic and observes all activities, this means we need to consider a period of time large enough to have qualitative and pertinent data.

Exploring callgraph and table view

The data collected by Blackfire’s continuous profiler is displayed in a dashboard aiming to kickstart your performance optimization journey. It is composed of a graph visualizing the evolution of the resource consumption for the selected dimension and time frame.

The table view displays a list of all the frames, or function calls, sorted by their resource consumption. By default, the table is sorted by exclusive resource consumption, which is the total value of the frame, minus the combined total values of its direct children.

The flame graph is a hierarchical visualization of the contribution of the different function calls to the selected dimension. Flame graphs have an uncanny resemblance with the deterministic profile timeline but are visualized pretty differently.

If one box represents one frame or function call, they are alphabetically sorted on the x-axis. We are embracing all activities on all threads for the selected time frame. Flame graphs are effective in identifying performance issues and understanding the behavior of an application during its execution.

We can see that some boxes, or spans, have a colored background. Those spans are consuming way more resources than all the others for the selected dimension and time frame. The stronger the color, the more resources are consumed. Double-clicking a span allows you to drill down by reframing the context of what is currently being displayed.

Enriching your observability strategy with the continuous profiler

As our journey behind the observability scenes reaches its end, you should now have a clear understanding of the nature of the different data types and tools at your disposal. Leaving you in a better position to use their relative strength and weaknesses to your advantage to build, or enrich, your observability strategy with our new continuous profiler for PHP, Python, Node.js, and Golang.

Let’s continue the conversation on Dev.to, Discord, Reddit, or our new community portal. Let us know how are you using our continuous profiler and the features you would love to be added to our solution.

To better observability and beyond!

The “Understanding continuous profiling” series:

part 1
part 2
part 3 and final (you are here)

Thomas di Luccio

Thomas is Product Manager at Platform.sh for Blackfire.io. He likes nothing more than understanding the users' needs and helping them find practical and empowering solutions. He’ll support you as a day-to-day user.