The Challenges of Async Python Observability: Profiling asynchronous code 2/3
How does Profiling fit into asynchronous Python apps using asyncio and FastAPI?
Blog post series index:
Introduction to FastAPI and asyncio
Profiling Asynchronous Code (you are here)
Blackfire to the Rescue
Let’s extend the previous example a bit more by adding two synchronous functions:
- one is I/O intensive:
read_big_file()
; - the other one is CPU intensive:
parse_contents()
, which parses the file read byread_big_file()
.
def burn_cpu(secs): t0 = time.time() elapsed = 0 while (elapsed < secs): for _ in range(1000): Pass elapsed = time.time() - t0 def fetch_from_db_sync(): time.sleep(1.5) def read_big_file(): time.sleep(1.0) return "some big content" def parse_contents(_): burn_cpu(1.0) async def do_http_request(): async with httpx.AsyncClient() as client: r = await client.get('https://google.com') print(r.status_code) async def home(request: Request): contents = read_big_file() parse_contents(contents) tasks = [do_http_request(), asyncio.to_thread(fetch_from_db_sync)] results = await asyncio.gather(*tasks) return {"results": results}
As these two new functions are synchronous functions (read_big_file
and parse_contents
), and as asyncio is a co-operatively scheduled framework, no context switch happens for these functions. In other words, during the execution of these functions, no other asyncio task gets a chance to run (assuming there is only a single thread which is the case here). In FastAPI terms, this means that no concurrent HTTP requests are handled until we read the file and parse it. This kind of issue — blocking the event loop — can drastically decrease performance.
It is usually advised not to read a big file nor perform a CPU-intensive operation within an HTTP handler. But if you happen to do it, you might lose all the benefits that came along with asyncio, and debugging these kinds of issues might be hard due to the asynchronous nature of the application itself.
So, what can we do when we spot a performance issue in an ASGI application? Use a profiler!
Let’s run above with cProfile for example. Below is the output we get:
ncalls tottime percall cumtime percall filename:lineno(function) 35 0.000 0.000 0.163 0.005 t8.py:30(do_http_request) 2 0.000 0.000 0.000 0.000 t8.py:37(home)
This is rather confusing and worth answering a few questions:
- Where is the call to
fetch_from_db_sync
? - Why is
home
called twice and why does the wall time seem to be 0? - Why does
do_http_request
seem to be called 35 times instead of 1?
Let’s begin with the first question about fetch_from_db_sync
. As the function is called via asyncio.to_thread
, it is actually running within another thread. cProfile cannot profile code that is running this way, and as a result, the fetch_from_db_sync
call doesn’t appear in the profile.
The second question is trickier. cProfile being a deterministic profiler like Blackfire, whenever a function is called and exits, a profiler event is generated and cProfile does measurements at these points. This scheme works well with synchronous code.
However, with asynchronous code, another thing happens: whenever an await
is called and the event loop context switches to another task, a coroutine suspend
event is generated, which translates to the same function call exit
event. As a result, the call count is 2 because the single await call in the home function is counted as another function call.
As for wall-time sticking to zero, all time spent in await asyncio.gather()
is not accumulated into the home()
function because, according to cProfile, the function exited.
The issue stands for the do_http_request()
call. It seems to be called 35 times because a lot of task switches are happening in the code that sends the HTTP request and receives a response back.
As far as the details may be tricky to understand, we can summarize in the following way: In order to profile an async application and get relevant output, some additional effort is required. This is also true for sampling profilers.
In the next article, we’ll see how Blackfire can go way further.
Subscribe to Blackfire Monitoring
Start a Blackfire Monitoring subscription today and set up the Continous Observability of your Python applications.
Happy Performance Optimization!