The Challenges of Async Python Observability: Profiling asynchronous code 2/3

By Sümer Cip, on Jan 12, 2022

Blog post series index:

Introduction to FastAPI and asyncio
Profiling Asynchronous Code (you are here)
Blackfire to the Rescue (to be published)

Let’s extend the previous example a bit more by adding two synchronous functions: 

  • one is I/O intensive: read_big_file();
  • the other one is CPU intensive: parse_contents(), which parses the file read by read_big_file().
def burn_cpu(secs):
   t0 = time.time()
   elapsed = 0
   while (elapsed < secs):
      for _ in range(1000):
          Pass
      elapsed = time.time() - t0

def fetch_from_db_sync():
    time.sleep(1.5)
def read_big_file():
    time.sleep(1.0)
    return "some big content"
def parse_contents(_):
    burn_cpu(1.0)
async def do_http_request():
    async with httpx.AsyncClient() as client:
        r = await client.get('https://google.com')
        print(r.status_code)

async def home(request: Request):
   contents = read_big_file()
   parse_contents(contents)
   tasks = [do_http_request(), asyncio.to_thread(fetch_from_db_sync)]
   results = await asyncio.gather(*tasks)
   return {"results": results}

As these two new functions are synchronous functions (read_big_file and parse_contents), and as asyncio is a co-operatively scheduled framework, no context switch happens for these functions. In other words, during the execution of these functions, no other asyncio task gets a chance to run (assuming there is only a single thread which is the case here). In FastAPI terms, this means that no concurrent HTTP requests are handled until we read the file and parse it. This kind of issue — blocking the event loop — can drastically decrease performance

It is usually advised not to read a big file nor perform a CPU-intensive operation within an HTTP handler. But if you happen to do it, you might lose all the benefits that came along with asyncio, and debugging these kinds of issues might be hard due to the asynchronous nature of the application itself.

So, what can we do when we spot a performance issue in an ASGI application? Use a profiler!
Let’s run above with cProfile for example. Below is the output we get:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    35    0.000    0.000    0.163    0.005 t8.py:30(do_http_request)
    2     0.000    0.000    0.000    0.000 t8.py:37(home)

This is rather confusing and worth answering a few questions:

  • Where is the call to fetch_from_db_sync?
  • Why is home called twice and why does the wall time seem to be 0? 
  • Why does do_http_request seem to be called 35 times instead of 1?

Let’s begin with the first question about  fetch_from_db_sync. As the function is called via asyncio.to_thread, it is actually running within another thread. cProfile cannot profile code that is running this way, and as a result, the fetch_from_db_sync call doesn’t appear in the profile.

The second question is trickier. cProfile being a deterministic profiler like Blackfire, whenever a function is called and exits, a profiler event is generated and cProfile does measurements at these points. This scheme works well with synchronous code. 

However, with asynchronous code, another thing happens: whenever an await is called and the event loop context switches to another task, a coroutine suspend event is generated, which translates to the same function call exit event. As a result, the call count is 2 because the single await call in the home function is counted as another function call.

As for wall-time sticking to zero, all time spent in await asyncio.gather() is not accumulated into the home() function because, according to cProfile, the function exited. 

The issue stands for the do_http_request() call. It seems to be called 35 times because a lot of task switches are happening in the code that sends the HTTP request and receives a response back.

As far as the details may be tricky to understand, we can summarize in the following way: In order to profile an async application and get relevant output, some additional effort is required. This is also true for sampling profilers.

In the next article, we’ll see how Blackfire can go way further.

Subscribe to Blackfire Monitoring

Start a Blackfire Monitoring subscription today and set up the Continous Observability of your Python applications.

Happy Performance Optimization!

Sümer Cip

Sümer Cip is currently a Senior Software Engineer at Blackfire. He has been coding for nearly 20 years and contributed to many open source projects in his career. He has a deep love for Python and obsession with code performance and low-level stuff.