Profiling 101 for Python Developers: The Many Types of Profilers 2/6
This article is the second of a series of six on Profilers in the Python world, and how Blackfire is the best-of-breed tool to introspect code behavior and optimize its performance.
Blog post series index:
- What is a Profiler?
- The Many Types of Profilers (you are here)
- List of Existing Python Profilers
- Picking the Right Profiler
- Profiles visualizations
- Using Blackfire
Types of Profilers
There are 3 kinds of profilers :
- Deterministic (Tracing) Profilers
- Statistical (Sampling) Profilers
- Memory Profilers
Like we have explained above, a profiler makes specific measurements of the profiled software. The difference between deterministic and statistical profilers is about when those measurements are made.
A deterministic profiler makes measurements when a certain event like a function call, function leave or an exception happens. These events are generally fired by the underlying runtime.
Deterministic Profilers in Python use by using setprofile API. They hook function
return (plus a few more) events to calculate the metrics.
A statistical profiler makes its measurements at specific intervals so it will lose some information due to this. But, as the name implies, it will statistically gives a pretty accurate picture of what is happening in the system with enough data. There are two categories of statistical profiler implementations: external sampling and internal sampling.
Sampling is done by a separate process other than the profiled interpreter itself. This approach has some unique benefits:
- you don’t need to instrument the profiled application,
- any programming error in the profiler will not affect the profiled application.
As you might expect, though: everything comes at a cost. Implementing a robust external sampling profiler is hard. Why? Because what you are trying to do is to read the call stacks of threads of an external application.
This means you need to:
- find the external process;
- attach to it;
- read its memory;
- identify where the call stack starts in memory;
- then read the call stack.
And you will have to cope with the version of the interpreter itself since the Python internal data structures changes from version to version.
There are pretty good examples in this category: py-spy and pyflame uses this approach. They read external process memory either via
ptrace or some OS APIs (
read_vm) and then try to find the offset of the Python interpreter call stack.
Finding the offset is in itself a complex task due to ASLR. See here for more details.
In this mode, the sampling is done inside the interpreter.
With an internal sampling strategy, executing a sampler function at specific intervals might become a problem. One of the ways to execute functions at specific intervals on OSes like Linux is using
signals and they come with their unique issues:
- They can interfere with I/O – if your application is waiting on a blocking system call (eg:
recv()from a socket), the signal will force that call to fail to run the signal handler.
- There may be other libraries/code using these signals.
- They are POSIX only, thus not compatible with Windows.
And again, despite above issues, there are pretty decent profilers using this approach, too. As an example: Google’s Python cloud profiler stackdriver, pyinstrument and plop all work by registering a sampler function that runs in the context of the profiled application.
Memory profilers might help you analyze memory usage and finding leaks in your code. The causes of memory leaks can include:
- dangling large objects which are not released,
- underlying libraries/C extensions reference leaks,
- reference cycles.
Blackfire, a deterministic profiler
Blackfire is a deterministic profiler which requires no instrumentation of the code:
- no code change needed to profile, nor to revert once you push to a live environment;
- no impact for end-users, as we’ll discover in the next articles.
Next article: List of Existing Python Profilers