Profiling 101 for Python Developers: The Many Types of Profilers 2/6

By Sümer Cip, on Feb 03, 2020

There are 3 kinds of profilers :

  • Deterministic (Tracing) Profilers
  • Statistical (Sampling) Profilers
  • Memory Profilers

Like we have explained above, a profiler makes specific measurements of the profiled software. The difference between deterministic and statistical profilers is about when those measurements are made.

Deterministic Profilers

A deterministic profiler makes measurements when a certain event like a function call, function leave or an exception happens. These events are generally fired by the underlying runtime.

Deterministic Profilers in Python use by using setprofile API. They hook function call and return (plus a few more) events to calculate the metrics.

Statistical Profilers

A statistical profiler makes its measurements at specific intervals so it will lose some information due to this. But, as the name implies, it will statistically gives a pretty accurate picture of what is happening in the system with enough data. There are two categories of statistical profiler implementations: external sampling and internal sampling.

External Sampling

Sampling is done by a separate process other than the profiled interpreter itself. This approach has some unique benefits:

  • you don’t need to instrument the profiled application,
  • any programming error in the profiler will not affect the profiled application.

As you might expect, though: everything comes at a cost. Implementing a robust external sampling profiler is hard. Why? Because what you are trying to do is to read the call stacks of threads of an external application.

This means you need to:

  • find the external process;
  • attach to it;
  • read its memory;
  • identify where the call stack starts in memory;
  • then read the call stack.

And you will have to cope with the version of the interpreter itself since the Python internal data structures changes from version to version.

There are pretty good examples in this category: py-spy and pyflame uses this approach. They read external process memory either via ptrace or some OS APIs (ReadProcessMemoryprocess_vm_readvread_vm) and then try to find the offset of the Python interpreter call stack.

Finding the offset is in itself a complex task due to ASLR. See here for more details.

Internal sampling

In this mode, the sampling is done inside the interpreter.

With an internal sampling strategy, executing a sampler function at specific intervals might become a problem. One of the ways to execute functions at specific intervals on OSes like Linux is using signals and they come with their unique issues:

  • They can interfere with I/O – if your application is waiting on a blocking system call (eg: recv() from a socket), the signal will force that call to fail to run the signal handler.
  • There may be other libraries/code using these signals.
  • They are POSIX only, thus not compatible with Windows.

And again, despite above issues, there are pretty decent profilers using this approach, too. As an example: Google’s Python cloud profiler stackdriver, pyinstrument and plop all work by registering a sampler function that runs in the context of the profiled application.

Memory Profilers

Memory profilers might help you analyze memory usage and finding leaks in your code. The causes of memory leaks can include:

  • dangling large objects which are not released,
  • underlying libraries/C extensions reference leaks,
  • reference cycles.

Blackfire, a deterministic profiler

Blackfire is a deterministic profiler which requires no instrumentation of the code:

  • no code change needed to profile, nor to revert once you push to a live environment;
  • no impact for end-users, as we’ll discover in the next articles.

Sümer Cip

Sümer Cip is currently a Senior Software Engineer at Blackfire. He has been coding for nearly 20 years and contributed to many open source projects in his career. He has a deep love for Python and obsession with code performance and low-level stuff.