Profiling ML Pipelines with Flame Graphs and Lineprofiler

Posted on Aug 9, 2021

In this post, I want to solidify my learnings concerning profiling and create a good sequence of steps to track down performance bugs. This will help me (and maybe you) to debug similar problems faster in the future. My learning in a nutshell:

Don’t jump to conclusions and cast a wide net with the highly recommended py-spy package.

Background

A few weeks ago I faced a performance problem in our recommender systems application that was particularly hard to locate1. Eventually, after trying out several profiling and debugging techniques in a semi-random order, I figured out the root cause of the problem (many function invocations of a numpy method whose efficiency decreased after an update to NumPy).

Checking Time Spent Between Log Messages

The application logs are the first thing I check when seeing an unexplainable increase in execution time. In this case, I found that the time between a log message right before scoring our model and right after increased significantly after the last deployment to production. Running the application in debug mode with more log messages helped to narrow down the offending code even more. However, checking log messages might not be enough. In my case, for example, the code between the messages contained a biggish loop, and we don’t log within this loop to avoid flooding our logs with thousands of messages.

Checking Flame Graphs for your Application

After identifying the scoring part of our application, I thought I exactly knew where the problem was. I directly jumped into a Jupyter Notebook and used the [line_profiler package][lp] to check the execution time for single functions and lines of code. This was a mistake, my assumptions were wrong and the problem was in another part of the scoring code.

I should have assumed less and started with profiling bigger parts of the application with Flame Graphs. Flame Graphs are an awesome tool for debugging because they let you profile your whole application with minimal overhead in a production-like setting (or even in production). A simple Flame Graph might look like this:

Flame Graph as created by the py-spy package
Example Py-Spy output. Source https://github.com/benfred/py-spy
To read the Flame Graph most easily, just gauge the length of the rectangles. Functions that take much longer than you would expect are often a good starting point (check out branangregg.com/flamegraphs for more details).

To find where my application went wrong, I pulled the production docker image that produced the slow execution time and the “fast” docker image that ran in production before that. With the py-spy package by Ben Frederickson, it was simple to produce Flame Graphs for both images. Comparing them side by side, I was immeadiately able to pinpoint the function that was causing the increase in execution time (a function that invoked a specific NumPy method a lot). Additionally, Flame Graphs helped me a lot to better understand where our application spends most of its time which might come in handy for future performance improvements.

Probably the simplest way to get started with py-spy and in a docker image is to install it in an adhoc manner like so:

docker run \
    --cap-add SYS_PTRACE \
    -v "$(pwd):/usr/local/data" \
    slow-prod-image \
    /bin/bash -c \
    "pip install py-spy && py-spy record --native -o /usr/local/data/slow-prod-profile.svg -- python application.main"

For details on the --cap-add SYS_PTRACE variable checkout py-spy’s readme on docker. If you want to profile multiple images with different settings, I’d recommend utilizing docker-compose to keep your sanity.

Diving Deep and Iterating Quickly with the line_profiler Package

After identifying the problematic code, I jumped into refactoring and testing the code interactively in a Jupyter Notebook. Here, the line_profiler package and the %lprun magic came in very handy. Check out this chapter from Jake VanderPlas Python Data Science Handbook.

Summary

Next time I encounter a performance issue that is not obvious from the application logs, I’ll start my profiling journey with py-spy and only jump into line_profiler if I need to.


  1. Some packages needed updating in conjunction, mutual depending packages, and the need to update a package due to a bug in earlier versions. [lp]: https://github.com/pyutils/line_profiler ↩︎