Optimizing Django ORM QueriesDjango is a fantastic tool to quickly create an admin dashboard for managing entities.  But the magic of brevity oftentimes comes at the expense of suboptimal performance. The Django ORM is a coin with two sides.  It simplifies work with complex datasets and makes it easy and more intuitive to create queries. Unfortunately, it obfuscates what’s going on under the hood, leaving an “open door” for inefficient queries to go into your app. This slowdown can be invisible for the first time, but as your app starts to scale, these minor inefficiencies may become a performance problem.

Continuous review is the best practice to catch this type of slowdown before it ever makes its way into production code. When creating the Django project, it’s good to develop a habit of checking whether the queries deliver on the performance targets.  If you want to know what Django does on under the hood, you can use DEBUG mode or even “Django-debug-toolbar.”

But the smarter way is to use Percona Monitoring and Management (PMM) to keep control over how your app works with DB. PMM plays the role of a microscope to identify small issues early that can become a massive problem on a big production scale.  It gives you a general profile of your application queries with detailed metrics, query explains, indexes, and table stats. And, most importantly, it can be run in production to monitor and react on issues that become visible on a big scale.

Let’s Analyze a Simple Example of Using PMM

We have a service to manage IoT sensors in the apartment with the following dependencies Rooms 1 <-> n Sensors. Here we use the admin part of Django only. For illustration, it requires less code and is commonly used for creating small and big back offices.

Let’s install PMM Server and point it on our DB:

 

So we will have this view:

Django ORM Queries with Percona Monitoring and Management

Now, let’s take a look at the myriad of queries and metrics that PMM organized for us, and find a query that selects values for the sensor’s view.

Query Analytics Dashboard Percona Monitoring and Management

Oh, the first we can find is Room Description which is not lazy joined to Sensors.

I only once opened the sensor’s view for the last five minutes and in this period we can see two query classes related to this view.

The first SELECT … FROM sensors – select sensors ids and names appear one time and the second SELECT … FROM rooms – select Room Description appears four times, ones for each sensor. This is called “The N+1 problem”.

This is probably the most common issue that new users of Django usually face: select related objects in a single query.

Let’s tell Django to retrieve each room’s name and area in the same query. Property list_select_related = True add to SensorAdmin class.
Now, admin.py looks like:

Here we can see how the new query for the sensor’s view looks:

PMM Query Analytics: query with select_related

It appears only once per page request.

Let’s investigate this query more deeply about how it executes with the search parameter and what we can improve. We have a property search_fields = (‘^name’,) of SensorAdmin class, which allows us to filter sensors by the first part of names.

Filtered list of sensors
And what gets our attention in query details? It is the ratio of “Rows examined” to “Rows sent.” We expect “Rows sent” to be close to “Rows examined” because otherwise, it means that a query handles many rows (“examined”) which are not used in the final result set (“sent”). It means wasted CPU cycles and even unnecessary IOs if rows are not in memory.

PMM: Query Details

So here we have a room for improvement that speeds up search on a big dataset. There is no general rule when, and how, we need to add an index, as it hugely depends on data type, variability, on insert/select ratio and also on application usage at all.

But let’s add an index here, assuming that it is appropriate.

Let’s add db_index=True to Sensor.name property and apply migration:

As a result, we have reduced the amount of examined rows from five to three per row sent:

PMM: Query Details

So, in conclusion: with the continuous review, we can see the trends and effects of our optimizations on PMM charts even on small datasets that we usually use during development, and keep the performance up of production deployment.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Uli

If you start to optimize Django it’s like watching the movie Groundhog Day. The day starts by a unpleasent meeting with your DBA that tries to kill you and a tester that complains about performance with many data. Then you optimize the whole day to survive. At the end of the day you want to develop something new and the whole process starts gain.

If you want to do it right, start writing the whole query by hand. Use select_related