Deploy a Django Application on AWS Elastic Beanstalk and scale it with Memcache
This post is out of date. We published an updated version, read Deploy a Django Application on AWS Elastic Beanstalk and scale it with Memcache.
Want to deploy a Django application on AWS Elastic Beanstalk that is ready to scale? We’ll explore how to set up your Elastic Beanstalk environment, hook it up to a database, deploy your application, and finally how to use Memcache to speed it up.
We’ll walk you through creating the application from start to finish, but you can view the finished product source code here.
Memcache is a technology that improves the performance and scalability of web apps and mobile app backends. You should consider using Memcache when your pages are loading too slowly or your app is having scalability issues. Even for small sites, Memcache can make page loads snappy and help future-proof your app.
Prerequisites
Before you complete the steps in this guide, make sure you have all of the following:
- Familiarity with Python (and ideally Django)
- An AWS account. If you haven’t used AWS before, you can set up an account here.
- The AWS CLI installed and configured on your computer.
- Python,
git
, and the EB CLI installed on your computer.
Required Versions:
- Python 3.6
- Pip 18.0
Since Elastic Beanstalk has specific requirements, if you’re running a different version of Python on your machine, consider using a tool like pyenv, virtualenv, or python’s own venv.
Create a Django application for Elastic Beanstalk
The following commands will create an isolated Python environment and bootstrap an empty Django app:
$ mkdir django_memcache && cd django_memcache
$ python -m venv venv
$ source venv/bin/activate
(venv) $ pip install Django
(venv) $ django-admin startproject django_tasklist .
(venv) $ python manage.py runserver
Performing system checks...
System check identified no issues (0 silenced).
...
Django version 2.2, using settings 'django_tasklist.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Visiting http://localhost:8000 will show a “hello, world” landing page.
Create an Elastic Beanstalk app
Associate your Django project with a new Elastic Beanstalk app with the following steps:
Use
pip freeze
to write your dependencies to a file namedrequirements.txt
.(venv) $ pip freeze > requirements.txt
This file is required in order for Elastic Beanstalk to know what to install during deployment.
Create an
.ebextensions
folder and add adjango.config
file:(venv) $ mkdir .ebextensions (venv) $ touch .ebextensions/django.config
Set the
WSGIPath
in.ebextensions/django.config
so Elastic Beanstalk can start your application:option_settings: aws:elasticbeanstalk:container:python: WSGIPath: django_tasklist/wsgi.py
Initialize a Git repository and commit the skeleton. Start by adding a
.gitignore
file to make sure you don’t commit files you don’t want to. Paste the following into it:venv *.pyc db.sqlite3
Now commit all files to the Git repository:
$ git init $ git add . $ git commit -m 'Django skeleton'
Create an Elastic Beanstalk repo:
$ eb init -p python-3.6 django-memcache --region us-east-1
This will set up a new application called
django-memcache
. Feel free to use a different region. Then we’ll create an environment to run our application in:$ eb create django-env -db.engine mysql -db.i db.t2.micro
Notice that we’re adding a MySQL database to our EB environment. You’ll be prompted for a username and password for the database. You can set them to whatever you like.
Be careful when choosing your password. AWS does not handle symbols very well (! $ @ etc.), and can cause some unexpected behavior. Stick to letters and numbers, and make sure it’s at least eight characters long.
This will create a AWS Relational Database Service (RDS) instance that is associated with this application. When you terminate this application, the database instance will be destroyed as well. If you need a RDS instance that is independent of your Elastic Beanstalk application, create one via the AWS RDS interface.
This configuration process will take about five minutes. Go refill your coffee, stretch your legs, and come back later.
Configure Django for Elastic Beanstalk
You will need to make two changes to the vanilla Django skeleton for it to work on Elastic Beanstalk.
Allow any app domain name
You need to allow Django to run on any host. Do this by setting
ALLOWED_HOSTS = ['*']
indjango_tasklist/settings.py
.Note, once you have a domain name for your page you should use that instead of a wildcard.
Set up the MySQL database
Django comes with SQLite configured by default. This will not work out of the box on Elastic Beanstalk. Since our EB environment already has a MySQL database initialized, we’ll configure this database in Django.
To use our database in Django, we need to install the
mysqlclient
:(venv) $ pip install mysqlclient (venv) $ pip freeze > requirements.txt
Finally, configure the database in
django_tasklist/settings.py
(replace current SQLite configuration):def get_db(): try: return { 'default': { 'ENGINE': 'django.db.backends.mysql', 'NAME': os.environ['RDS_DB_NAME'], 'USER': os.environ['RDS_USERNAME'], 'PASSWORD': os.environ['RDS_PASSWORD'], 'HOST': os.environ['RDS_HOSTNAME'], 'PORT': os.environ['RDS_PORT'], 'OPTIONS': { 'init_command': "SET sql_mode='STRICT_TRANS_TABLES'", }, } }except: return { 'default': { 'ENGINE': 'django.db.backends.sqlite3', 'NAME': 'db.sqlite3', } } = get_db() DATABASES
Save the changes to git:
$ git add .
$ git commit -m 'Initial EB config'
Deploy the Django app on Elastic Beanstalk
Deploying the Django application on EB is easily done by running the deploy command:
(venv) $ eb deploy
You can now open the application and see if it’s working:
(venv) $ eb open
You should now see the same landing page with the little rocket as when you ran the Django app locally.
If you get a 500
error when you open the application, check the logs. They’re
located in the EB console in the side menu labeled Logs
.
Add task list functionality
The Django application we are building is a task list. In addition to displaying the list, it will have actions to add new tasks and to remove them. To accomplish this, we need to:
- Create a task list app
- Create a
Task
model - Create the route, view, and controller logic
Create a task list app
Django has the concept of apps and we need to create one in order to add any
functionality. We will create a mc_tasklist
app:
(venv) $ python manage.py startapp mc_tasklist
Add mc_tasklist
to the list of installed apps in django_tasklist/settings.py
:
= [
INSTALLED_APPS 'django.contrib.admin',
# ...
'mc_tasklist',
]
Create the Task model
To create and store tasks, we need to do two things:
Create a simple
Task
model inmc_tasklist/models.py
:from django.db import models class Task(models.Model): = models.TextField() name
Use
makemigrations
andmigrate
to create a migration for themc_tasklist
app as well as create themc_tasklist_task
table, along with all other default Django tables:(venv) $ python manage.py makemigrations mc_tasklist (venv) $ python manage.py migrate Operations to perform: Apply all migrations: admin, auth, contenttypes, mc_tasklist, sessions Running migrations: Applying contenttypes.0001_initial... OK Applying auth.0001_initial... OK Applying admin.0001_initial... OK Applying admin.0002_logentry_remove_auto_add... OK Applying admin.0003_logentry_add_action_flag_choices... OK Applying contenttypes.0002_remove_content_type_name... OK Applying auth.0002_alter_permission_name_max_length... OK Applying auth.0003_alter_user_email_max_length... OK Applying auth.0004_alter_user_username_opts... OK Applying auth.0005_alter_user_last_login_null... OK Applying auth.0006_require_contenttypes_0002... OK Applying auth.0007_alter_validators_add_error_messages... OK Applying auth.0008_alter_user_username_max_length... OK Applying auth.0009_alter_user_last_name_max_length... OK Applying mc_tasklist.0001_initial... OK Applying sessions.0001_initial... OK
To run the migrations when deploying on Elastic Beanstalk create
.ebextensions/task_list.config
with the following content:container_commands: 01_migrate: command: "django-admin.py migrate" leader_only: true option_settings: aws:elasticbeanstalk:application:environment: DJANGO_SETTINGS_MODULE: django_tasklist.settings
Create the task list application
The actual application consists of a view that is displayed in the front end and a controller that implements the functionality in the back end. You also need to tell Django which controller corresponds to which URL.
Setup the routes for add, remove, and index methods in
django_tasklist/urls.py
:# ... from mc_tasklist import views = [ urlpatterns # ... 'add', views.add), path('remove', views.remove), path('', views.index), path( ]
Add corresponding view controllers in
mc_tasklist/views.py
:from django.template.context_processors import csrf from django.shortcuts import render_to_response, redirect from mc_tasklist.models import Task def index(request): = Task.objects.order_by("id") tasks = {'tasks': tasks} c c.update(csrf(request))return render_to_response('index.html', c) def add(request): = Task(name=request.POST["name"]) item item.save()return redirect("/") def remove(request): = Task.objects.get(id=request.POST["id"]) item if item: item.delete()return redirect("/")
Create a template with display code in
mc_tasklist/templates/index.html
:<!DOCTYPE html> <head> <meta charset="utf-8"> <title>MemCachier Django tutorial</title> <!-- Fonts --> <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.4.0/css/font-awesome.min.css" rel='stylesheet' type='text/css' /> <!-- Bootstrap CSS --> <link href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" rel="stylesheet" /> </head> <body> <div class="container"> <!-- New Task Card --> <div class="card"> <div class="card-body"> <h5 class="card-title">New Task</h5> <form action="add" method="POST"> {% csrf_token %}<div class="form-group"> <input type="text" class="form-control" placeholder="Task Name" name="name" required> </div> <button type="submit" class="btn btn-default"> <i class="fa fa-plus"></i> Add Task </button> </form> </div> </div> <!-- Current Tasks --> {% if tasks %}<div class="card"> <div class="card-body"> <h5 class="card-title">Current Tasks</h5> <table class="table table-striped"> {% for task in tasks %}<tr> <!-- Task Name --> <td class="table-text">{{ task.name }}</td> <!-- Delete Button --> <td> <form action="remove" method="POST"> {% csrf_token %}<input type="hidden" name="id" value="{{ task.id }}"> <button type="submit" class="btn btn-danger"> <i class="fa fa-trash"></i> Delete </button> </form> </td> </tr> {% endfor %}</table> </div> </div> {% endif %}</div> <!-- Bootstrap related JavaScript --> <script src="https://code.jquery.com/jquery-3.2.1.slim.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js"></script> </body> </html>
The view consists of two cards: one that contains a form to create new tasks, and another that contains a table with existing tasks and a delete button associated with each task.
Note that Django will automatically check each apps
templates
folder for templates.
Our task list is now functional. Save the changes so far with:
$ git add .
$ git commit -m 'Add task list controller and views'
Deploy and view the task list on Elastic Beanstalk:
(venv) $ eb deploy
(venv) $ eb open
Test the app by adding a few tasks. We now have a functioning task. With this complete, we can learn how to improve its performance with Memcache.
Add caching to Django
Memcache is an in-memory, distributed cache. Its primary API consists of two
operations: SET(key, value)
and GET(key)
.
Memcache is like a hashmap (or dictionary) that is spread across
multiple servers, where operations are still performed in constant
time.
The most common use for Memcache is to cache the results of expensive database queries and HTML renders so that these expensive operations don’t need to happen over and over again.
Set up Memcache
To use Memcache in Django, you first need to provision an actual Memcached
cache. You can easily get one for free from
MemCachier. MemCachier provides easy to use,
performant caches that are compatible with the popular memcached
protocol.
It allows you to just use a cache
without having to setup and maintain actual Memcached servers yourself.
There are three config variables you’ll need for your application to be able to
connect to your cache: MEMCACHIER_SERVERS
, MEMCACHIER_USERNAME
, and
MEMCACHIER_PASSWORD
. You’ll need to add these variables to EB.
$ eb setenv MEMCACHIER_USERNAME=<username> MEMCACHIER_PASSWORD=<password> MEMCACHIER_SERVERS=<servers>
We can confirm that they’ve been set by running:
$ eb printenv
Then we need to configure the appropriate dependencies.
(venv) $ pip install pylibmc
(venv) $ pip freeze > requirements.txt
Since EB does not play nicely with pylibmc
, we’ll also need to upgrade pip
and install libmemcached
using ebextensions/config
files.
(venv) $ touch .ebextensions/upgrade_pip.config
Inside .ebextensions/upgrade_pip.config
, include:
commands:
pip_upgrade:
command: /opt/python/run/venv/bin/pip install --upgrade pip
ignoreErrors: false
We’ll also need to add the following to .ebextensions/task_list.config
:
packages:
yum:
libmemcached-devel: []
container_commands:
# ....
Configure Django with MemCachier
Configure your cache by adding the following to the end of
django_tasklist/settings.py
:
def get_cache():
import os
try:
= os.environ['MEMCACHIER_SERVERS']
servers = os.environ['MEMCACHIER_USERNAME']
username = os.environ['MEMCACHIER_PASSWORD']
password return {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyLibMCCache',
# TIMEOUT is not the connection timeout! It's the default expiration
# timeout that should be applied to keys! Setting it to `None`
# disables expiration.
'TIMEOUT': None,
'LOCATION': servers,
'OPTIONS': {
'binary': True,
'username': username,
'password': password,
'behaviors': {
# Enable faster IO
'no_block': True,
'tcp_nodelay': True,
# Keep connection alive
'tcp_keepalive': True,
# Timeout settings
'connect_timeout': 2000, # ms
'send_timeout': 750 * 1000, # us
'receive_timeout': 750 * 1000, # us
'_poll_timeout': 2000, # ms
# Better failover
'ketama': True,
'remove_failed': 1,
'retry_timeout': 2,
'dead_timeout': 30,
}
}
}
}except:
return {
'default': {
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache'
}
}
= get_cache() CACHES
This configures the cache for both development
and production. If the MEMCACHIER_*
environment variables exist,
the cache will be setup with pylibmc
, connecting to
MemCachier. Whereas, if the MEMCACHIER_*
environment variables
don’t exist – hence development mode – Django’s simple in-memory
cache is used instead.
Cache expensive database queries
Memcache is often used to cache expensive database queries. This simple example doesn’t include any expensive queries, but for the sake of learning, let’s assume that getting all tasks from the database is an expensive operation.
The task list database query code in mc_tasklist/views.py
can be modified
to check the cache first like so:
# ...
from django.core.cache import cache
import time
= "tasks.all"
TASKS_KEY
def index(request):
= cache.get(TASKS_KEY)
tasks if not tasks:
2) # simulate a slow query.
time.sleep(= Task.objects.order_by("id")
tasks set(TASKS_KEY, tasks)
cache.= {'tasks': tasks}
c
c.update(csrf(request))return render_to_response('index.html', c)
# ...
The above code first checks the cache to see if the tasks.all
key exists
in the cache. If it does not, a database query is executed and the
cache is updated. Subsequent pageloads will not need to perform the
database query. The time.sleep(2)
only exists to simulate a slow
query.
Re-deploy the app to Elastic Beanstalk with
$ git add .
$ git commit -m 'Add caching'
$ eb deploy
and test the new functionality. To see what’s going on in your cache, open the MemCachier dashboard for your cache.
The first time you loaded your task list, you should have gotten an increase
for the get miss
and set
commands. Every subsequent reload of the task list
should increase get hit
s (refresh the stats in the dashboard).
Our cache is working, but there is still a major problem. Add a new task and see what happens. No new task appears on the current tasks list! The new task was created in the database, but the app is serving the stale task list from the cache.
Clear stale data
There are many techniques for dealing with an out-of-date cache.
Expiration: The easiest way to make sure the cache does not get stale is by setting an expiration time. The
cache.set
method can take an optional third argument, which is the time in seconds that the cache key should stay in the cache. If this option is not specified, the defaultTIMEOUT
value insettings.py
will be used instead.You could modify the
cache.set
method to look like this:set(TASKS_KEY, tasks, 5) cache.
But this functionality only works when it is known for how long the cached value is valid. In our case however, the cache gets stale upon user interaction (add, remove a task).
Delete cached value: A straight forward strategy is to invalidate the
tasks.all
key when you know the cache is out of date – namely, to modify theadd
andremove
views to delete thetasks.all
key:# ... def add(request): = Task(name=request.POST["name"]) item item.save() cache.delete(TASKS_KEY)return redirect("/") def remove(request): = Task.objects.get(id=request.POST["id"]) item if item: item.delete() cache.delete(TASKS_KEY)return redirect("/")
Key based expiration: Another technique to invalidate stale data is to change the key:
# ... import random import string def _hash(size=16, chars=string.ascii_letters + string.digits): return ''.join(random.choice(chars) for _ in range(size)) def _new_tasks_key(): return 'tasks.all.' + _hash() = _new_tasks_key() TASKS_KEY # ... def add(request): = Task(name=request.POST["name"]) item item.save()global TASKS_KEY = _new_tasks_key() TASKS_KEY return redirect("/") def remove(request): = Task.objects.get(id=request.POST["id"]) item if item: item.delete()global TASKS_KEY = _new_tasks_key() TASKS_KEY return redirect("/")
The upside of key based expiration is that you do not have to interact with the cache to expire the value. The LRU eviction of Memcache will clean out the old keys eventually.
Update cache: Instead of invalidating the key, the value can also be updated to reflect the new task list:
# ... def add(request): = Task(name=request.POST["name"]) item item.save()set(TASKS_KEY, Task.objects.order_by("id")) cache.return redirect("/") def remove(request): = Task.objects.get(id=request.POST["id"]) item if item: item.delete()set(TASKS_KEY, Task.objects.order_by("id")) cache.return redirect("/")
Updating the value instead of deleting it will allow the first pageload to avoid having to go to the database
You can use option 2, 3, or 4 to make sure the cache will not ever be
out-of-date.
As usual, redeploy the app afterwards with eb deploy
.
Now when you add a new task, all the tasks you’ve added since implementing caching will appear.
Use Django’s integrated caching
Django also has a few built in ways to use your Memcache to improve performance. These mainly target the rendering of HTML which is an expensive operation that is taxing for the CPU.
Caching and CSRF
You cannot cache any views or fragments that contain forms with CSRF tokens because the token changes with each request. For the sake of learning how to use Django’s integrated caching we will disable Django’s CSRF middleware. Since this task list is public, this is not a big deal but do not do this in any serious production application.
Comment CsrfViewMiddleware
in django_tasklist/settings.py
:
= [
MIDDLEWARE # ...
# 'django.middleware.csrf.CsrfViewMiddleware',
# ...
]
Cache template fragments
Django allows you to cache rendered template fragments. This is
similar to snippet caching in Flask, or caching rendered partials in
Laravel. To enable fragment caching add {% load cache %}
to the top of your
template.
Do not cache fragments that include forms with CSRF tokens.
To cache a rendered set of task entries, we use a {% cache timeout key %}
statement in
mc_tasklist/templates/index.html
:
{% load cache %}<!-- ... -->
<table class="table table-striped">
{% for task in tasks %}
{% cache None 'task-fragment' task.id %}<tr>
<!-- ... -->
</tr>
{% endcache %}
{% endfor %}</table>
<!-- ... -->
Here the timeout is None
and the key is a list of strings that will be
concatenated. As long as task IDs are never reused, this is all there is to
caching rendered snippets. The MySQL database we use on Elastic Beanstalk does
not reuse IDs, so we’re all set.
If you use a database that does reuse IDs, you need to delete the fragment when its respective task is deleted. You can do this by adding the following code to the task deletion logic:
from django.core.cache.utils import make_template_fragment_key
= make_template_fragment_key("task-fragment", vary_on=[str(item.id)])
key cache.delete(key)
Let’s see the effect of caching the fragments in our application. You should
now observe an additional get hit
for each task in your list whenever
you reload the page (except the first reload).
Cache entire views
We can go one step further and cache entire views instead of fragments. This should be done with care, because it can result in unintended side effects if a view frequently changes or contains forms for user input. In our task list example, both of these conditions are true because the task list changes each time a task is added or deleted, and the view contains forms to add and delete a task.
Do not cache views that include forms with CSRF tokens.
You can cache the task list view with the @cache_page(timeout)
decorator in
mc_tasklist/views.py
:
# ...
from django.views.decorators.cache import cache_page
@cache_page(None)
def index(request):
# ...
# ...
Because the view changes whenever we add or remove a task, we need to delete the cached view whenever this happens. This is not straight forward. We need to learn the key when the view is cached in order to be then able to delete it:
# ...
from django.utils.cache import learn_cache_key
= ""
VIEW_KEY
@cache_page(None)
def index(request):
# ...
= render_to_response('index.html', c)
response global VIEW_KEY
= learn_cache_key(request, response)
VIEW_KEY return response
def add(request):
# ...
cache.delete(VIEW_KEY)return redirect("/")
def remove(request):
= Task.objects.get(id=request.POST["id"])
item if item:
# ...
cache.delete(VIEW_KEY)return redirect("/")
To see the effect of view caching, reload your application. On the first
refresh, you should see the get hit
counter increase according to the number
of tasks you have, as well as an additional get miss
and set
, which
correspond to the view that is now cached. Any subsequent reload will
increase the get hit
counter by just two, because the entire view is
retrieved with two get
commands.
Note that view caching does not obsolete the caching of expensive operations or template fragments. It is good practice to cache smaller operations within cached larger operations, or smaller fragments within larger fragments. This technique (called Russian doll caching) helps with performance if a larger operation, fragment, or view is removed from the cache, because the building blocks do not have to be recreated from scratch.
Using Memcache for session storage
Memcache works well for storing information for short-lived sessions that time out. However, because Memcache is a cache and therefore not persistent, long-lived sessions are better suited to permanent storage options, such as your database.
For short-lived sessions configure SESSION_ENGINE
to use the cache backend in
django_tasklist/settings.py
:
= 'django.contrib.sessions.backends.cache' SESSION_ENGINE
For long-lived sessions, Django allows you to use a write-through cache, backed
by a database. This is the best option for performance while guaranteeing
persistence. To use the write-through cache, configure the SESSION_ENGINE
in
django_tasklist/settings.py
like so:
= 'django.contrib.sessions.backends.cached_db' SESSION_ENGINE
For more information on how to use sessions in Django, please see the Django Session Documentation
Clean up
Once you’re done with this tutorial and don’t want to use it anymore, you can clean up your EB instance by using:
$ eb terminate
This will clean up all of the AWS resources.