How to scale a Django Application on PythonAnywhere with Memcache

This guide shows how to create a simple Django 2.1 application on PythonAnywhere and then add Memcache to alleviate a performance bottleneck.

We’ll walk you through creating the application from start to finish, but you can view the finished product source code here.

This article mainly targets Python 3 since Django 2+ no longer supports Python 2. However, if you want to use Python 2 with an older version of Django this guide should still work.

Memcache is a technology that improves the performance and scalability of web apps and mobile app backends. You should consider using Memcache when your pages are loading too slowly or your app is having scalability issues. Even for small sites, Memcache can make page loads snappy and help future-proof your app.

Prerequisites

Before you complete the steps in this guide, make sure you have all of the following:

How to create and edit files

Whenever you need to create and edit files on PythonAnywhere, there is two ways to do this. You can either do it from a bash terminal, using your favorite editor, or you can create and open files from within the Files tab which can be accessed from your dashboard. Feel free to use whichever option you prefer.

Create a Django application on PythonAnywhere

PythonAnywhere can create a new “Hello World” Django app for you within the global python environment. However, for this tutorial we will use a neatly isolated virtualenv making it easier to add other apps in the future if required. To get started, log into your PythonAnywhere account, open a new bash terminal and enter the following commands:

$ mkdir django_memcache && cd django_memcache
$ python3 -m venv venv    # For Python 2 use `virtualenv venv`
$ source venv/bin/activate
(venv) $ pip install Django
(venv) $ django-admin.py startproject django_tasklist .

In order to run this skeleton app, we need to do three things on PythonAnywhere:

  1. Add a new web app in the Web tab (you can get there from your PythonAnywhere dashboard). Select the app’s domain name, choose Manual configuration and a reasonable Python version like 3.6, and let PythonAnywhere create the WSGI file for you. To finish the manual configuration, enter the source code path (/home/<username>/django_memcache) and the virtualenv path (/home/<username>/django_memcache/venv) on the Web app configuration page.

  2. Open the file /var/www/<domain-name>_wsgi.py and delete everything in it. Then add the following:

    import os
    import sys
    
    project_home = os.path.expanduser('~/django_memcache')
    
    # Add your project to sys.path
    if project_home not in sys.path:
        sys.path.insert(0, project_home)
    
    # Set environment variable to tell django where your settings.py is
    os.environ['DJANGO_SETTINGS_MODULE'] = 'django_tasklist.settings'
    
    # Serve Django via WSGI
    from django.core.wsgi import get_wsgi_application
    application = get_wsgi_application()
  3. Set ALLOWED_HOSTS = ['*'] in django_tasklist/settings.py.

Now you can reload your app from the Web tab and visit it. You should see the default Django page with the cute little rocket.

Add task list functionality

The Django application we are building is a task list. In addition to displaying the list, it will have actions to add new tasks and to remove them. To accomplish this, we need to:

  1. Create a task list app
  2. Set up the database
  3. Create a Task model
  4. Create the route, view, and controller logic

Create a task list app

Django has the concept of apps and we need to create one in order to add any functionality. We will create a mc_tasklist app:

(venv) $ python manage.py startapp mc_tasklist

Add mc_tasklist to the list of installed apps in django_tasklist/settings.py:

INSTALLED_APPS = [
    'django.contrib.admin',
    # ...
    'mc_tasklist',
]

Set up a MySQL database

We need to create a database before we can configure it in Django. On PythonAnywhere, you can add a free MySQL database to your app from the Databases tab. Create a new MySQL service and add a database called task_list.

If you prefer to use PostgreSQL, even better. The setup is very similar. We use MySQL here because it is available for free.

To use our database in Django, we need to install the mysqlclient:

(venv) $ pip install mysqlclient

Finally, configure the database in django_tasklist/settings.py (replace current SQLite configuration):

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': '<username>$task_list',
        'USER': '<username>',
        'PASSWORD': '<mysql_password>',
        'HOST': '<username>.mysql.pythonanywhere-services.com',
        'OPTIONS': {
            'init_command': "SET sql_mode='STRICT_TRANS_TABLES'",
        },
    }
}

Replace <username> and <mysql_password> with your values. Note, if you plan to commit this code to version control, fetch your password from an environment variable.

Create the Task model

To create and store tasks, we need to do two things:

  1. Create a simple Task model in mc_tasklist/models.py:

    from django.db import models
    
    class Task(models.Model):
        name = models.TextField()
  2. Use makemigrations and migrate to create a migration for the mc_tasklist app as well as create the mc_tasklist_task table, along with all other default Django tables:

    (venv) $ python manage.py makemigrations mc_tasklist
    (venv) $ python manage.py migrate
    Operations to perform:
      Apply all migrations: admin, auth, contenttypes, mc_tasklist, sessions
    Running migrations:
      Applying contenttypes.0001_initial... OK
      Applying auth.0001_initial... OK
      Applying admin.0001_initial... OK
      Applying admin.0002_logentry_remove_auto_add... OK
      Applying admin.0003_logentry_add_action_flag_choices... OK
      Applying contenttypes.0002_remove_content_type_name... OK
      Applying auth.0002_alter_permission_name_max_length... OK
      Applying auth.0003_alter_user_email_max_length... OK
      Applying auth.0004_alter_user_username_opts... OK
      Applying auth.0005_alter_user_last_login_null... OK
      Applying auth.0006_require_contenttypes_0002... OK
      Applying auth.0007_alter_validators_add_error_messages... OK
      Applying auth.0008_alter_user_username_max_length... OK
      Applying auth.0009_alter_user_last_name_max_length... OK
      Applying mc_tasklist.0001_initial... OK
      Applying sessions.0001_initial... OK

Create the task list application

The actual application consists of a view that is displayed in the front end and a controller that implements the functionality in the back end. You also need to tell Django which controller corresponds to which URL.

  1. Setup the routes for add, remove, and index methods in django_tasklist/urls.py:

    # ...
    from mc_tasklist import views
    urlpatterns = [
        # ...
        path('add', views.add),
        path('remove', views.remove),
        path('', views.index),
    ]
  2. Add corresponding view controllers in mc_tasklist/views.py:

    from django.template.context_processors import csrf
    from django.shortcuts import render_to_response, redirect
    from mc_tasklist.models import Task
    
    def index(request):
        tasks = Task.objects.order_by("id")
        c = {'tasks': tasks}
        c.update(csrf(request))
        return render_to_response('index.html', c)
    
    def add(request):
        item = Task(name=request.POST["name"])
        item.save()
        return redirect("/")
    
    def remove(request):
        item = Task.objects.get(id=request.POST["id"])
        if item:
            item.delete()
        return redirect("/")
  3. Create a template with display code in mc_tasklist/templates/index.html:

    <!DOCTYPE html>
    <head>
      <meta charset="utf-8">
      <title>MemCachier Django tutorial</title>
      <!-- Fonts -->
      <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.4.0/css/font-awesome.min.css"
            rel='stylesheet' type='text/css' />
      <!-- Bootstrap CSS -->
      <link href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css"
            rel="stylesheet" />
    </head>
    
    <body>
      <div class="container">
        <!-- New Task Card -->
        <div class="card">
          <div class="card-body">
            <h5 class="card-title">New Task</h5>
    
            <form action="add" method="POST">
              {% csrf_token %}
              <div class="form-group">
                <input type="text" class="form-control" placeholder="Task Name"
                       name="name" required>
              </div>
              <button type="submit" class="btn btn-default">
                <i class="fa fa-plus"></i> Add Task
              </button>
            </form>
          </div>
        </div>
    
        <!-- Current Tasks -->
        {% if tasks %}
        <div class="card">
          <div class="card-body">
            <h5 class="card-title">Current Tasks</h5>
    
            <table class="table table-striped">
              {% for task in tasks %}
              <tr>
                <!-- Task Name -->
                <td class="table-text">{{ task.name }}</td>
                <!-- Delete Button -->
                <td>
                  <form action="remove" method="POST">
                    {% csrf_token %}
                    <input type="hidden" name="id" value="{{ task.id }}">
                    <button type="submit" class="btn btn-danger">
                      <i class="fa fa-trash"></i> Delete
                    </button>
                  </form>
                </td>
              </tr>
              {% endfor %}
            </table>
          </div>
        </div>
        {% endif %}
      </div>
    
      <!-- Bootstrap related JavaScript -->
      <script src="https://code.jquery.com/jquery-3.2.1.slim.min.js"></script>
      <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js"></script>
      <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js"></script>
    </body>
    </html>

    The view consists of two cards: one that contains a form to create new tasks, and another that contains a table with existing tasks and a delete button associated with each task.

    Note that Django will automatically check each apps templates folder for templates.

Our task list is now functional. Reload the app from the Web tab and test it by adding a few tasks. We now have a functioning task list running on PythonAnywhere. With this complete, we can learn how to improve its performance with Memcache.

Add caching to Django

Memcache is an in-memory, distributed cache. Its primary API consists of two operations: SET(key, value) and GET(key). Memcache is like a hashmap (or dictionary) that is spread across multiple servers, where operations are still performed in constant time.

The most common use for Memcache is to cache the results of expensive database queries and HTML renders so that these expensive operations don’t need to happen over and over again.

Provision a Memcache

To use Memcache in Django, you first need to provision an actual Memcached cache. You can easily get one for free from MemCachier. This allows you to just use a cache without having to setup and maintain actual Memcached servers yourself.

There are three config variables you’ll need for your application to be able to connect to your cache: MEMCACHIER_SERVERS, MEMCACHIER_USERNAME, and MEMCACHIER_PASSWORD. Get them from your MemCachier dashboard and add them to your /var/www/<domain-name>_wsgi.py file:

# ...
os.environ['MEMCACHIER_USERNAME'] = '<cache-username>'
os.environ['MEMCACHIER_PASSWORD'] = '<cache-password>'
os.environ['MEMCACHIER_SERVERS'] = '<cache-servers>'

# Serve Django via WSGI
# ...

Configure Django with MemCachier

Django requires pylibmc in order to connect the Memcache server:

(venv) $ pip install pylibmc

As of Django 1.11 we can use its native pylibmc backend. For older versions of Django you will need to install django-pylibmc.

Configure your cache by adding the following to the end of django_tasklist/settings.py:

def get_cache():
  import os
  try:
    servers = os.environ['MEMCACHIER_SERVERS']
    username = os.environ['MEMCACHIER_USERNAME']
    password = os.environ['MEMCACHIER_PASSWORD']
    return {
      'default': {
        'BACKEND': 'django.core.cache.backends.memcached.PyLibMCCache',
        # TIMEOUT is not the connection timeout! It's the default expiration
        # timeout that should be applied to keys! Setting it to `None`
        # disables expiration.
        'TIMEOUT': None,
        'LOCATION': servers,
        'OPTIONS': {
          'binary': True,
          'username': username,
          'password': password,
          'behaviors': {
            # Enable faster IO
            'no_block': True,
            'tcp_nodelay': True,
            # Keep connection alive
            'tcp_keepalive': True,
            # Timeout settings
            'connect_timeout': 2000, # ms
            'send_timeout': 750 * 1000, # us
            'receive_timeout': 750 * 1000, # us
            '_poll_timeout': 2000, # ms
            # Better failover
            'ketama': True,
            'remove_failed': 1,
            'retry_timeout': 2,
            'dead_timeout': 30,
          }
        }
      }
    }
  except:
    return {
      'default': {
        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache'
      }
    }

CACHES = get_cache()

This configures the cache for both development and production. If the MEMCACHIER_* environment variables exist, the cache will be setup with pylibmc, connecting to MemCachier. Whereas, if the MEMCACHIER_* environment variables don’t exist – hence development mode – Django’s simple in-memory cache is used instead.

Cache expensive database queries

Memcache is often used to cache expensive database queries. This simple example doesn’t include any expensive queries, but for the sake of learning, let’s assume that getting all tasks from the database is an expensive operation.

The task list database query code in mc_tasklist/views.py can be modified to check the cache first like so:

# ...
from django.core.cache import cache
import time

TASKS_KEY = "tasks.all"

def index(request):
    tasks = cache.get(TASKS_KEY)
    if not tasks:
        time.sleep(2)  # simulate a slow query.
        tasks = Task.objects.order_by("id")
        cache.set(TASKS_KEY, tasks)
    c = {'tasks': tasks}
    c.update(csrf(request))
    return render_to_response('index.html', c)
# ...

The above code first checks the cache to see if the tasks.all key exists in the cache. If it does not, a database query is executed and the cache is updated. Subsequent pageloads will not need to perform the database query. The time.sleep(2) only exists to simulate a slow query.

Reload the app and test the new functionality. To see what’s going on in your cache, open the MemCachier dashboard for your cache.

The first time you loaded your task list, you should have gotten an increase for the get miss and set commands. Every subsequent reload of the task list should increase get hits (refresh the stats in the dashboard).

Our cache is working, but there is still a major problem. Add a new task and see what happens. No new task appears on the current tasks list! The new task was created in the database, but the app is serving the stale task list from the cache.

Clear stale data

There are many techniques for dealing with an out-of-date cache.

  1. Expiration: The easiest way to make sure the cache does not get stale is by setting an expiration time. The cache.set method can take an optional third argument, which is the time in seconds that the cache key should stay in the cache. If this option is not specified, the default TIMEOUT value in settings.py will be used instead.

    You could modify the cache.set method to look like this:

    cache.set(TASKS_KEY, tasks, 5)

    But this functionality only works when it is known for how long the cached value is valid. In our case however, the cache gets stale upon user interaction (add, remove a task).

  2. Delete cached value: A straight forward strategy is to invalidate the tasks.all key when you know the cache is out of date – namely, to modify the add and remove views to delete the tasks.all key:

    # ...
    def add(request):
        item = Task(name=request.POST["name"])
        item.save()
        cache.delete(TASKS_KEY)
        return redirect("/")
    
    def remove(request):
        item = Task.objects.get(id=request.POST["id"])
        if item:
            item.delete()
            cache.delete(TASKS_KEY)
        return redirect("/")
  3. Key based expiration: Another technique to invalidate stale data is to change the key:

    # ...
    import random
    import string
    
    def _hash(size=16, chars=string.ascii_letters + string.digits):
        return ''.join(random.choice(chars) for _ in range(size))
    
    def _new_tasks_key():
        return 'tasks.all.' + _hash()
    
    TASKS_KEY = _new_tasks_key()
    
    # ...
    
    def add(request):
        item = Task(name=request.POST["name"])
        item.save()
        global TASKS_KEY
        TASKS_KEY = _new_tasks_key()
        return redirect("/")
    
    def remove(request):
        item = Task.objects.get(id=request.POST["id"])
        if item:
            item.delete()
            global TASKS_KEY
            TASKS_KEY = _new_tasks_key()
        return redirect("/")

    The upside of key based expiration is that you do not have to interact with the cache to expire the value. The LRU eviction of Memcache will clean out the old keys eventually.

  4. Update cache: Instead of invalidating the key, the value can also be updated to reflect the new task list:

    # ...
    def add(request):
        item = Task(name=request.POST["name"])
        item.save()
        cache.set(TASKS_KEY, Task.objects.order_by("id"))
        return redirect("/")
    
    def remove(request):
        item = Task.objects.get(id=request.POST["id"])
        if item:
            item.delete()
            cache.set(TASKS_KEY, Task.objects.order_by("id"))
        return redirect("/")

    Updating the value instead of deleting it will allow the first pageload to avoid having to go to the database

You can use option 2, 3, or 4 to make sure the cache will not ever be out-of-date. As usual, reload the app afterwards.

Now when you add a new task, all the tasks you’ve added since implementing caching will appear.

Use Django’s integrated caching

Django also has a few built in ways to use your Memcache to improve performance. These mainly target the rendering of HTML which is an expensive operation that is taxing for the CPU.

Caching and CSRF

You cannot cache any views or fragments that contain forms with CSRF tokens because the token changes with each request. For the sake of learning how to use Django’s integrated caching we will disable Django’s CSRF middleware. Since this task list is public, this is not a big deal but do not do this in any serious production application.

Comment CsrfViewMiddleware in django_tasklist/settings.py:

MIDDLEWARE = [
    # ...
    # 'django.middleware.csrf.CsrfViewMiddleware',
    # ...
]

Cache template fragments

Django allows you to cache rendered template fragments. This is similar to snippet caching in Flask, or caching rendered partials in Laravel. To enable fragment caching add {% load cache %} to the top of your template.

Do not cache fragments that include forms with CSRF tokens.

To cache a rendered set of task entries, we use a {% cache timeout key %} statement in mc_tasklist/templates/index.html:

{% load cache %}
<!-- ... -->

<table class="table table-striped">
  {% for task in tasks %}
    {% cache None 'task-fragment' task.id %}
    <tr>
      <!-- ... -->
    </tr>
    {% endcache %}
  {% endfor %}
</table>

<!-- ... -->

Here the timeout is None and the key is a list of strings that will be concatenated. As long as task IDs are never reused, this is all there is to caching rendered snippets. The MySQL database we use on PythonAnywhere does not reuse IDs, so we’re all set.

If you use a database that does reuse IDs, you need to delete the fragment when its respective task is deleted. You can do this by adding the following code to the task deletion logic:

from django.core.cache.utils import make_template_fragment_key
key = make_template_fragment_key("task-fragment", vary_on=[str(item.id)])
cache.delete(key)

Let’s see the effect of caching the fragments in our application. You should now observe an additional get hit for each task in your list whenever you reload the page (except the first reload).

Cache entire views

We can go one step further and cache entire views instead of fragments. This should be done with care, because it can result in unintended side effects if a view frequently changes or contains forms for user input. In our task list example, both of these conditions are true because the task list changes each time a task is added or deleted, and the view contains forms to add and delete a task.

Do not cache views that include forms with CSRF tokens.

You can cache the task list view with the @cache_page(timeout) decorator in mc_tasklist/views.py:

# ...
from django.views.decorators.cache import cache_page

@cache_page(None)
def index(request):
    # ...

# ...

Because the view changes whenever we add or remove a task, we need to delete the cached view whenever this happens. This is not straight forward. We need to learn the key when the view is cached in order to be then able to delete it:

# ...
from django.utils.cache import learn_cache_key

VIEW_KEY = ""

@cache_page(None)
def index(request):
    # ...
    response = render_to_response('index.html', c)
    global VIEW_KEY
    VIEW_KEY = learn_cache_key(request, response)
    return response

def add(request):
    # ...
    cache.delete(VIEW_KEY)
    return redirect("/")

def remove(request):
    item = Task.objects.get(id=request.POST["id"])
    if item:
        # ...
        cache.delete(VIEW_KEY)
    return redirect("/")

To see the effect of view caching, reload your application. On the first refresh, you should see the get hit counter increase according to the number of tasks you have, as well as an additional get miss and set, which correspond to the view that is now cached. Any subsequent reload will increase the get hit counter by just two, because the entire view is retrieved with two get commands.

Note that view caching does not obsolete the caching of expensive operations or template fragments. It is good practice to cache smaller operations within cached larger operations, or smaller fragments within larger fragments. This technique (called Russian doll caching) helps with performance if a larger operation, fragment, or view is removed from the cache, because the building blocks do not have to be recreated from scratch.

Using Memcache for session storage

Memcache works well for storing information for short-lived sessions that time out. However, because Memcache is a cache and therefore not persistent, long-lived sessions are better suited to permanent storage options, such as your database.

For short-lived sessions configure SESSION_ENGINE to use the cache backend in django_tasklist/settings.py:

SESSION_ENGINE = 'django.contrib.sessions.backends.cache'

For long-lived sessions, Django allows you to use a write-through cache, backed by a database. This is the best option for performance while guaranteeing persistence. To use the write-through cache, configure the SESSION_ENGINE in django_tasklist/settings.py like so:

SESSION_ENGINE = 'django.contrib.sessions.backends.cached_db'

For more information on how to use sessions in Django, please see the Django Session Documentation

Further reading and resources