updated May 29, 2019

Django ORM performance tips

The article explains how to use QuerySet API to produce less SQL queries or more effecient SQL so the database operations work faster.

0. How to check SQL produced by QuerySet

from django.db import connection

users = User.objects.all()
if users:
    user = users[0]
    
print(len(connection.queries))  # output: 2

1. Avoid objects existance check before getting it

Bad code: 2 SQL queries:

users = User.objects.all()
if users:  # 1st SQL execution
    user = users[0]  # 2nd SQL execution

Good code: one SQL with LIMIT 1:

user = User.objects.first()

users = User.objects.all()
try:
    user = users[0]
except IndexError:
    pass

2. Store query set in variable if iterate several times over it

Bad code: each iteration executes SQL query:

for user in User.objects.all():  # 1st SQL execution
    print(user)
for user in User.objects.all():  # 2nd SQL execution
    print(user)

Good code: cache query set in a variable:

users = User.objects.all()
for user in users:  # execute SQL
    print(user)
for user in users:  # iterate over cached objects
    print(user)

3. Prefer `exists()` to `count()` to check if the objects exist

Bad code: scans all the database rows that match QuerySet criteria:

if User.objects.count() == 0:
    print("No users")

Good code: SQL scans the 1st row only:

if not User.objects.exists():
    print("No users")

4. Foreign key property vs foreign object id property

Bad code: makes extra SQL query to load the author object:

author_id = article.author.id

Good code: no extra SQL:

author_id = article.author_id

5. `select_related()`: load foreign key objects in one SQL query instead `N + 1`

Bad code: makes extra SQL queries to load author properties: one query per article so the total queries count is 1+ len(articles):

for article in Article.objects.all():
    print(article.author.name)  # extra SQL query for each article to load the author

Good code: one SQL that joins the author table:

for article in Article.objects.select_related('author'):
    print(article.author.name)  # no extra SQL query

6. `prefetch_related()`: load many-to-many relations in 2 queries instead `N + 1`

Bad code: makes extra SQL query to load the comments for each article so the total queries count is 1 + len(articles):

for article in Article.objects.all():
    for comment in article.comments.all():
        print(comment.text)

Good code: 2 SQL queries that load articles and comments:

for article in Article.objects.prefetch_related('comments'):  
    for comment in article.comments.all():
        print(comment.text)

7. Return object properties values tuples instead objects

Load only requested properties values without model object creation:

names = User.objects.values_list('first_name', 'last_name')    # (str, str) tuple list
names = User.objects.values_list('first_name')                 # (str, ) tuple list

Use flat=True to return values instead tuples with one value:

names = User.objects.values_list('first_name', flat=True)      # str list
user_ids = User.objects.filter(rating=5).values_list('id', flat=True)

8. Subquery vs extra SQL query

One SQL with subquery for users:

users = User.objects.filter(rating=5)
Comment.object.filter(author__in=users)  # SQL with sub-query for users

Two SQL queries but they are simpler, and their total execution time could be less than for the complex query above if the users count is small:

user_ids = list(User.objects.filter(rating=5).values_list('id', flat=True))
Comment.object.filter(author__id__in=user_ids)