MemCachier with Django, Docker and AWS Elastic Container Service

In this tutorial we’re going to look a little beyond the usual “getting started” Django tutorials to look at some of the things you might want to do to deploy applications to production.

We’re going to do the following:

  1. Start with a very simple demonstration Django application. This will be an image thumbnailer, where you give an image URL and a desired thumbnail size and the application downloads the image from the given URL, generates a thumbnail and returns the result to you as an image.
  2. Dockerize our application, using Gunicorn as a production-ready web server.
  3. Deploy the application to Amazon’s Elastic Container Service, which will give us load balancing and container health checking and resilience.
  4. Add a MemCachier cache to our application to speed it up. (This will involve thinking about managing secrets in ECS, as well as showing how deploying new versions of containers works there.)

The MemCachier content of this tutorial is a relatively small fraction of what we’re going to do, but the idea is to demonstrate a more realistic use case than the usual toy tutorials (our demonstration application is still a toy, but it’s a toy that definitely benefits from caching). There are a number of things you have to consider when doing this sort of deployment that aren’t too obvious, and it’s useful to see the whole setup process from beginning to end.

Prerequisites

We’ll be working in Linux, although the distribution you use shouldn’t matter too much (I use Arch Linux, but Ubuntu or Debian or more or less any other “normal” distribution should be fine too).

You’ll need to have the following things installed (follow the links for installation instructions):

You’ll also need an AWS account. If you’ve not used AWS before, you can set up an account here. You get a pile of free resources to use for the first year after sign-up, and if you follow along with what we do in this tutorial, you shouldn’t get charged for anything (if you already have an AWS account, you might get charged a little for the load balancer we’re going to use).

All the code for the tutorial in a GitHub repository here. Clone the repo and we can get started!

Part 1: A simple Django application

There are three versions of our Django application in the top-level of the tutorial repository. In Part 1, we’ll work on the basic development version of the application, found in the thumbnailer-basic directory.

Setting up and a quick look at our application

We’ll manage Python dependencies with a virtual environment. To install the necessary Python dependencies, do the following:

Once this is done, you can run the application using Django’s development server:

If you then point your browser at http://localhost:8000, you’ll see the UI for our application. Paste in the URL to an image somewhere on the internet, select a thumbnail size, press the “Thumbnail it!” button and you’ll get a result page with some information about the image and a thumbnail.

Application structure

The contents of the thumbnailer-basic directory are a normal Django application (plus a couple of extras for deployment that we’ll take about later). Here’s what we have (things in parentheses are standard Django things we won’t talk about):

.
├── requirements.txt               Python dependencies
└── thumbnailer                    Django project directory
    ├── main                       Single Django app
    │   ├── (apps.py)
    │   ├── forms.py               URL+size form definition
    │   ├── (__init__.py)
    │   ├── templates
    │   │   └── main
    │   │       ├── index.html     Input form template
    │   │       └── result.html    Thumbnail result template
    │   ├── utils.py               Thumbnailing worker function
    │   └── views.py               Single input+result view
    ├── (manage.py)
    └── thumbnailer
        ├── (__init__.py)
        ├── (settings.py)
        ├── (urls.py)
        └── (wsgi.py)

We have a top-level thumbnailer directory, containing global configuration in the thumbnailer sub-directory, and a single main Django app that does all the work.

There are a couple of things that most Django application have that we don’t need here: we don’t have any models, so we don’t need a database, and we don’t use any authorization. That means that our settings file is simplified a little. For the purposes of this tutorial, we’re going to be running everything in debug mode, although you obviously shouldn’t do that in production for real applications.

Our application has a single URL (“/”) and a single class-based form view to implement it. Here’s what’s in views.py:

This view renders a form with inputs for a URL and a thumbnail size:

When valid form input is POSTed, we pass the URL and thumbnail size to a worker function (defined in utils.py), then render the result using the main/result.html template (which, as well as the thumbnail image, shows some other information, including the request processing time).

To keep things as simple as possible, the resulting thumbnail image is rendered using a data URI for its image content. The function that does the work of generating the thumbnail downloads the original image from the supplied image URL, creating a Pillow image from it, then uses the Image.thumbnail method to resize it. The generated thumbnail image is then extracted as a PNG and Base64-encoded to make a data URI:

All of the image processing is done in memory, using BytesIO streams to move data around. The request to the original image URL is done synchronously using the requests package, which means that the server thread servicing this request will be blocked while the image is downloaded. The time taken to download the original image is the main contribution to the request service time, which is displayed in the results page when the thumbnail is rendered. With my internet connection, for most smallish images, the total request service time varies between about 500 ms and about 1500 ms, depending on how my internet connection is behaving and depending on how fast the server hosting the original image is. This sort of thing is a good candidate for caching, as we’ll see later!

Performance measurement

Later on, we’re going to want to see how much of a difference caching makes to the performance of our thumbnailing application. I’ve written some Javascript code to do this, using webdriver.io, which is a test framework that uses Selenium to drive web browsers. It’s normally used for end-to-end testing of web applications, but it can be slightly abused for this kind of benchmarking too. It’s just about the simplest way to use Selenium that I’ve found. The code is in the benchmarking directory in the repository.

The benchmarking code is a standard Node.js application. You can install the Javascript dependencies by doing npm install in the benchmarking directory, then the benchmarks are run by doing npm test (optionally giving a URL to connect to in the TEST_URL environment variable, which defaults to http://localhost:8000 to connect to the Django development server).

When the benchmarks are run, they perform 100 thumbnail requests via automation of 10 instances of Firefox, choosing image URLs randomly from a small list of online images. The results are written to a results.json file, and contain information about the image URL and size requested, the server that processed the request and the time taken both on the server and in total for the request.

We’ll use this code to compare performance of the non-caching and caching code deployed to ECS later on.

Part 2: Deploying to Amazon ECS

Getting our application working in a development setup is only half the battle. We need to come up with a good way to deploy our application on production infrastructure. That means thinking about performance, resilience and monitoring. First of all, we shouldn’t use Django’s development server for serving our application in production. There are a number of better options, and we’re going to use Gunicorn, a high-performance Python server suitable for production deployments. Once we have our application working with Gunicorn, we’ll make some small changes needed for operation behind an AWS load balancer (basically just making sure that Django’s ALLOWED_HOSTS variable is set correctly). Then we’ll package the application up as a Docker container (we’ll “dockerize” it). This container can then be deployed to Amazon ECS (although we’ll also demonstrate it running locally first).

The code used in this part of the tutorial is in thumbnailer-ecs-no-caching directory of the tutorial repository.

Running with Gunicorn

Python web applications usually support an interface called WSGI (Web Server Gateway Interface), which defines a interface for web servers to serve web applications. Django supports WSGI and provides a WSGI application stub in the wsgi.py file it generates as part of setting up a new project.

WSGI applications can be served by any web server that supports the WSGI interface: we’re going to use Gunicorn which is simple to set up and has pretty good performance.

Why not just use the Django development server in production anyway? Well, it’s single-threaded and mostly optimised for quick restarts when we change our code during the development process. A production web server will have multiple server threads (or processes, depending on exactly how it’s set up) and will be optimised just for serving HTTP requests as quickly as possible.

To run Gunicorn, we need to make sure it’s installed (using pip install then putting the relevant versioned dependencies in the requirements.txt file), then run the gunicorn executable with command-line arguments to configure the server and pass it our WSGI application.

The command we’re going to use is

which we’ll put into a startup script (start.sh) in the thumbnailer directory and make it executable via chmod +x start.sh. The command-line arguments we use here do the following:

  • bind: which address and port to listen on;
  • workers: the number of worker processes to spawn;
  • worker-class: the type of worker to use – see below;
  • access-logfile, error-logfile, capture-output: arguments to force Gunicorn to log all errors and access to standard error and to pass Django output to the same place.

Finally, we pass the name of the Python module containing the WSGI application.

The worker-class argument specifies that we want to use asynchronous worker processes based on the gevent library. Since each request to the thumbnailer triggers an HTTP download from the image URL we pass in, requests can take arbitrarily long to process. As the Gunicorn documentation says:

The default synchronous workers assume that your application is resource-bound in terms of CPU and network bandwidth. Generally this means that your application shouldn’t do anything that takes an undefined amount of time. An example of something that takes an undefined amount of time is a request to the internet. At some point the external network will fail in such a way that clients will pile up on your servers. So, in this sense, any web application which makes outgoing requests to APIs will benefit from an asynchronous worker.

(To use the gevent worker type, we also need to install the gevent and greenlet packages and add them to requirements.txt.)

Once this is all set up, Gunicorn can be run just by running the start.sh script. If you do this, you’ll be able to access the thumbnailer at http://localhost:8000 as normal, and if you look at the processes running (do something like ps -wweFH and scroll back through your terminal to look for the process running the start.sh script), you’ll see that Gunicorn has started three worker processes to service request.

ALLOWED_HOSTS setup

The Django ALLOWED_HOSTS setting variable is used to constrain the server names for which a Django application will service requests, and prevents some kinds of attacks on our web server. Running in development, we don’t need to worry about this setting, but for production deployments, we need to set it correctly.

Normally, if you’re hosting a production web application at http://www.my-cool-company.com, you would just put www.my-cool-company.com in ALLOWED_HOSTS and everything would be fine. We don’t have a DNS domain to use for our deployment though, so we need to do things a little differently. We’re going to be using an AWS Elastic Load Balancer (ELB) to distribute HTTP requests across a few instances of our application, and all ELB endpoints have names of the form *.elb.amazonaws.com, so we can just put .elb.amazonaws.com in ALLOWED_HOSTS – this isn’t perfectly secure, but is good enough for a demonstration.

There is one additional wrinkle here, in that we need to support the health check requests that the load balancer sends to check that our application instances are alive. (If these health checks fail often enough, the load balancer kills and restarts our container instances.) The problem is that these health check requests are made directly to the private IP addresses of the EC2 instances hosting our containers, not to the .elb.amazonaws.com name (since the health check requests come from behind the load balancer endpoint). We can deal with this by adding the private IP address (which will lie in the 10.0.0.0 range we’re going to use in the virtual private cloud we set up during ECS configuration) to ALLOWED_HOSTS.

We can do this by adding the following bit of code to our settings.py:

This works by accessing the task metadata HTTP endpoint implemented by the container agent that AWS runs on EC2 instances hosting containers managed by ECS. This returns a JSON object containing all sorts of useful information, including the IP address of the container instance. Adding this to ALLOWED_HOSTS allows Django to service the health checks from the load balancer.

Dockerizing

Once we’ve reorganised our Django code as described above for production operation, we can create a Docker container for it. To do this, we obviously need to have Docker installed and working.

Docker works by building container images from layers using a type of union filesystem to reduce data transfer requirements when containers are updated. We define the process of creating a container using a Dockerfile which is a simple script that sets up the container contents. The format of the Dockerfile is documented here, but we only need a few features. Here’s the entirety of our Dockerfile:

# Build from default Python image (don't use "django" base image: it's
# obsolete).
FROM python:3.6.5

# This is the port that Gunicorn uses, so expose it to the outside
# world.
EXPOSE 8000

# Don't just add the base code directory, to try to cut down on
# rebuild sizes.
RUN mkdir /thumbnailer
WORKDIR /thumbnailer

# Install all the requirements -- doing things in this order should
# reduce the size of redeployments if we don't change dependencies.
ADD requirements.txt /thumbnailer/
RUN pip install -r requirements.txt

# Add the main code directory and point at the start script.
ADD . /thumbnailer
WORKDIR /thumbnailer/thumbnailer
CMD ["./start.sh"]

We start from a base image, which will be downloaded from a well-known Docker repository (the Docker Hub). We use an image with a suitable version of Python pre-installed. These base images are minimal operating system images suitable for use as containers. Any additional software you need in your container, you will need to install yourself.

The main things you do in a Dockerfile are to expose container ports to the outside world (we expose port 8000, which is the port that our start.sh tells Gunicorn to serve our application on), to import files from the host file system into the container (here, we use the ADD command to do that), to run shell commands within the container as it’s being set up (using the RUN command) and to establish an entry point for the container (i.e. the single process that will run when the container is launched – we use the CMD command to do this).

You don’t really need to know too much about how Docker works to use it, but it’s useful to know a little bit to help with optimising the way that container images get built. It’s easy to end up with a situation where you need to upload large amounts of data to deploy even trivial changes to your code. A bit of care can avoid that. The most important thing to know is that Docker treats each line in your Dockerfile as a command to create a new container image based on the result of the previous line. If Docker can satisfy itself that there have been no changes that affect the results of a line in the Dockerfile since the last time that the image was built, then it can reuse the results of the last build.

The things we need to do to set up our container image here are:

  1. Get the base Python image.
  2. Expose the port we want to use.
  3. Install all the Python package dependencies in our requirements.txt file into the container image.
  4. Import all of the code for our Django application into the container image.
  5. Set up the Gunicorn start.sh script as the entry point for the container image.

The key thing to notice here is that if we make a change to our code, but don’t change the Python packages we depend on, then Docker shouldn’t need to repeat steps 1-3 of this process, just reusing the results from the last build. That’s why we bring the requirements.txt file into the container image with a separate ADD command early in the Dockerfile: that gives us everything we need to install the Python dependencies, and makes a container image at this point in the build process that does not depend on the rest of our Python source files. We can change other things about our application and the Python dependency installation step won’t need to be rerun.

That might not seem so important for the image build process, which is relatively quick anyway, but when it comes to uploading container images for deployment and redeployment after code changes, it makes a very big difference. Because of this step-wise layered way that Docker works, if we can set things up so that only the final steps of the image build process change when we make code changes, only those layers of the resulting container image will need to be uploaded when we redeploy our container image. We’ll see in detail how this works later!

There’s one more thing that helps to optimise container image builds, which is the .dockerignore file. This is much like .gitignore for Git, giving a list of file patterns that Docker should ignore when processing ADD (and similar) commands. We can use it to exclude virtual environment directories, Python compiled code caches, and so on.

With our Dockerfile set up, we can build and tag a container image by saying

in the directory containing the Dockerfile. This takes a little while the first time you run it, because it needs to download the base Python container image. Once it’s done, you can run the docker images command to see a list of installed container images, and the thumbnailer image will be among them. In Docker, images and containers are usually identified by SHA hashes, but the -t argument to the build command above applies a tag to the resulting image, which is a little easier to deal with.

Once we’ve built the image, we can run it by saying

This command starts a container from the image we’ve built and publishes port 8000 from the container (which is where Gunicorn is serving the thumbnailer application) as port 8001 on the host, so we can now access the application by pointing a browser at http://localhost:8001.

We can take a look at what’s running in the container in another terminal. First, we do docker ps to get a list of running containers, resulting in output that looks something like this:

CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS              PORTS                    NAMES
81e29426f197        thumbnailer:latest   "./start.sh"        11 seconds ago      Up 7 seconds        0.0.0.0:8001->8000/tcp   gallant_swirles

We can execute another process in a running container using Docker’s exec command. If we do this:

where <container-id> is the hash of our running container from the docker ps output (here, 81e29426f197), we get a shell running inside the container (the -it flags tell Docker that this is an interactive task and that it should allocate a pseudo-TTY inside the container, both of which are necessary for an interactive shell). In the shell running inside the container, we can then do:

(The flags to ps cause it to display all running processes, showing their full command lines, and laid out in a process tree.) As you can see, apart from the shell and the ps process, the container contains only the parent Gunicorn process and its three child worker processes.

And finally, we can stop the container using

with the ID of the container.

Now that we have a container image built, we need to put it into a repository that’s accessible to AWS. AWS has a container registry service itself (called ECR, Elastic Container Repository) that provides you with a private container repository associated with your AWS account, and for production containers you might want to use this (because it’s private). For this tutorial, we’re going to use the public Docker Hub container registry though, mostly because it will require smaller file uploads – your ECR repository is completely empty to start with, so Docker has to upload all of the fixed operating system container layers as well as the layers that are specific to your application. Using Docker Hub, most of those fixed layers are cached, so we should only need to upload a smaller amount of data.

You’ll need to create an account at https://hub.docker.com. For this tutorial, I’ll use my personal account, which has account name skybluetrades. Once you’ve created an account and logged in, you’ll be presented with a list of your repositories (initially empty). Click on the “Create Repository” button to create a new public repository, and call it thumbnailer. (The full name you’ll use to refer to the repository also includes your Docker Hub user name, so for me, it’s skybluetrades/thumbnailer.)

We can now build a container image suitably tagged for uploading to Docker Hub – in the thumbnailer-ecs-no-caching directory, do the following (replacing skybluetrades with your own Docker Hub username):

Then, to upload the container image to Docker Hub, we first log in:

again using your own user name, and then supplying the requested password. Now you can push the container image by saying:

This will take a while, but you will see a number of messages saying “Mounted from library/python”, showing that many of the layers of our image are already present on Docker Hub so don’t need to be uploaded. Once the upload is complete, if you now look under the “Tags” tab in the repository page on Docker Hub for your thumbnailer repository, you’ll see that there is an image tagged “latest”.

ECS setup

We’re now going to deploy our Dockerized application on Amazon’s Elastic Container Service (ECS). There is quite a lot of setup involved in this, but the payoff is pretty good: we’ll end up with a scalable and load balanced deployment with integrated logging where we can easily deploy new versions of our application at the click of a button.

The first time you log into the AWS Management Console, it can seem a little overwhelming. There are dozens of different services available on the platform, lots of acronyms and specialised vocabulary, and it can be hard to know where to start. I’ve added an appendix to give a quick and minimal explanation to the parts of the AWS infrastructure we’re going to use. Whenever the instructions say “the EC2 console”, “the ECS console”, “the CloudWatch console” and so on, just go the main AWS console page and click on the link for the appropriate service to get to the service-specific console page.

ECS is a technology that melds a number of AWS components to simplify deploying and managing container-based applications. (It might not look like it’s simplifying much as we work through setting it up, but it’s quite a lot more complicated to manage all the AWS resources ECS controls by hand.)

There are three things we need to do:

  1. We need to create an ECS cluster to host our application. There are two options here: we can launch EC2 instances ourselves and use them to host our application via ECS, or we can use a technology called AWS Fargate to have ECS manage EC2 instances for us. We’re going to use Fargate, as it ends up requiring us to make less decisions about things. Fargate is what people often call a “serverless” technology, which doesn’t mean there are no servers, just that someone else manages the servers for us…
  2. Once we have an ECS cluster, we need to define an ECS task definition based on our container image and launch it into the cluster.
  3. Finally, we create an ECS service that will run multiple copies of our task behind a load balancer.

We’re going to do all of this setup through the AWS console.

Creating an ECS cluster

We want to use the AWS Fargate technology for our cluster. This technology is only supported in a subset of AWS regions at the moment, so we need to work in one of us-east-1 (North Virginia), us-east-2 (Ohio), us-west-2 (Oregon) or eu-west-1 (Ireland). We’ll use eu-west-1 for the tutorial, and if you’re following along, you should use either us-east-1, us-west-2 or eu-west-1, since these are the regions where MemCachier caches are available. (When setting up the ECS cluster, you’ll be able to tell if you’re not using a suitable region because the “Powered by AWS Fargate” option just won’t be available.)

  1. Log in to AWS and switch to one of the regions listed above (using the region menu near the top right of the page).
  2. Go to the ECS console (by selecting the ECS service from the main AWS console service list), click on “Clusters” in the left sidebar menu, then click on the “Create Cluster” button.
  3. Select the “Networking only / Powered by AWS Fargate” cluster template (you won’t see this option if you’re trying to use a region that doesn’t support Fargate). Click “Next step”.
  4. On the “Configure cluster” page, set the cluster name to “thumbnailer-cluster”, enable the “Create VPC” option and accept the defaults for the IP address range and subnets.
  5. Finally, click the “Create” button to create the cluster.

The cluster creation process uses an AWS service called CloudFormation to manage setting up networking resources. It takes a minute or two for the new VPC and its associated resources to be configured. Once the setup is complete, you can go the CloudFormation console where you’ll see the resources that were created (looking at this sort of thing can be a good way to learn how to set up AWS resources).

So now we have a cluster, but no tasks or services yet.

Creating an ECS task definition

Before we can create a service to run tasks in our cluster, we need to create a task definition.

  1. From the ECS console left-hand sidebar, click on “Task Definitions” then click on “Create new Task Definition”.
  2. Select the “FARGATE” launch type compatibility and click “Next step.”
  3. Fill in the configuration details:
    • Task Definition Name: thumbnailer-task;
    • Task Role: Select the ecsTaskExecutionRole from the dropdown;
    • Network Mode: awsvpc (you can’t change this);
    • Task execution role: ecsTaskExecutionRole;
    • Task memory: 0.5 Gb (the minimum);
    • Task CPU: 0.25 vCPU (the minimum).
  4. Under “Container Definitions”, click “Add container” and fill in the container details:
    • Container name: thumbnailer-container;
    • Image: skybluetrades/thumbnailer:latest (replace skybluetrades with your Docker Hub user name);
    • Soft Memory Limits: 256 Mb;
    • Port mappings: 8000 TCP (you only need to specify the internal container port, since ECS deals with assigning external ports).
    Accept all other default values, and click “Add”.
  5. Click the “Create” button to create the task definition.

This all does a couple of different things: it creates the task definition, which is what ECS uses to associate our container image with resources to run the image (we choose options to use more or less the minimum possible computational resources here); it sets up security roles that allow ECS to start EC2 instances for you as required; and it creates a logging group in the AWS CloudWatch logging service to aggregate log output from all the tasks we run from this task definition. (You can see this log group by going to the CloudWatch console and selecting “Logs” from the left-hand sidebar. The log group for our task definition is called /ecs/thumbnailer-task. It doesn’t have anything in it yet, but when ECS starts tasks, the logs from them will go into individual log streams within the /ecs/thumbnailer-task log group, where we can view them, create rules and alarms based on them, and so on.)

Creating an ECS service

The last step to getting our container image running on ECS is to create an ECS service.

  1. On the ECS clusters list, click on the thumbnailer-cluster entry, then click on “Create” under the “Services” tab.
  2. Configure the service:
    • Launch type: FARGATE;
    • Task Definition: thumbnailer-task;
    • Cluster: thumbnailer-cluster;
    • Service name: thumbnailer-service;
    • Number of tasks: 3.
    Click on “Next step”.
  3. On the “VPC and security groups” section:
    • Select the VPC created as part of the cluster (it should already be selected, but you can check by looking at the CloudFormation stack that ECS created for the cluster);
    • Enable both subnets of the VPC.
    • Under “Security groups”, choose “Edit” to modify the ingress rules for the service security group: change the security group name to thumbnailer-sg and change the single existing ingress rule to be a “Custom TCP” for port 8000 (the port our container image exposes).
  4. In the “Load balancing” section, select “Application Load Balancer”. Open the link that appears to the EC2 Console in a new browser tab to create a new load balancer. The link will open at the start of the load balancer creation wizard. From there:
    • Click on “Create” under “Application Load Balancer” and fill in the configuration details on the next page;
    • Name: thumbnailer-lb;
    • Scheme: internet-facing;
    • Listeners: accept the default HTTP listener;
    • Under “Availability Zones”, select the ECS cluster VPC and enable both subnets (which are in different AZs);
    • Click to go to the next page, then click again (there’s nothing to do on the “Configure Security Settings” page);
    • On the “Configure Security Groups” page, select “Create a new security group”, give the new security group the name thumbnailer-lb-sg and make sure it has a single ingress rule for HTTP traffic (port 80);
    • On the “Configure Routing” page, select “New target group”, give the target group the name thumbnailer-tg, change the “Target type” to “ip”, and accept all other settings;
    • Skip over the “Register targets” page: ECS will be responsible for managing the instances to our load balancer target group as it starts and stops them – we don’t need to anything here.
    Check over the parameters on the “Review” page, then click on “Create” to create the load balancer.
  5. Once the load balancer creation has completed, return to the browser tab with the ECS service creation wizard, press the refresh button next to the “Load balancer name” field and select the load balancer you just created.
  6. In the “Container to load balance” section, select the thumbnailer-container and click on “Add to load balancer”. Choose “80:HTTP” for the “Listener port” and select the target group we created for the load balancer as the “Target group name”. Disable service discovery, accept defaults for everything else and press “Next step”.
  7. Skip the autoscaling setup page, review the configuration on the final page, then click on “Create Service”.

Once the service creation process is complete, you should be able to go to the thumbnailer-service view from the thumbnailer-cluster page of the ECS console. Open the “Tasks” tab, press the refresh button a few times, and you should see the service tasks starting.

Once the service tasks have started, you can look at their output in the logs: go to the CloudWatch console, choose “Logs” from the left-hand sidebar, then select the /ecs/thumbnailer-task log group. There should be one log stream for each of the three tasks we started. If you click on one of them, you should see first the startup messages from Gunicorn, then Apache-style access logs for each task. We’ve not connected to the tasks yet, but you should see requests coming into the root URL from the ELB health checker, which is the process that decides whether the instances behind the load balancer are healthy or not.

We can find the URL for our application under the information for the thumbnailer-lb load balancer, which is accessible from the EC2 console (“Load Balancers” from the sidebar). The “DNS name” field gives a name that can be used to access the application: point your browser at that and you can do some thumbnailing in our containers running on ECS.

If you do a few examples via the load balancer DNS name, you’ll see that the server hostname changes from request to request, providing some evidence that we really are load balancing.

What can go wrong…

There are a few things that can go wrong with this setup. A couple are obvious errors, like putting resources that should be in the same VPC on different VPCs, but a bigger potential problem is a little more subtle.

We have two security groups involved in what’s going on here, one for the load balancer (thumbnail-lb-sg) and one for the container (thumbnailer-sg). The load balancer is what faces the outside world, and so should have its HTTP port (port 80) open to public traffic. The other, container, security group, lives behind the load balancer and is not publicly accessible. It should have an ingress rule for the port used by the container, which is port 8000, not the default HTTP port.

If you get this wrong, nothing will work because traffic from the load balancer to the containers will be silently refused. The easiest way to diagnose this problem is to look at the load balancer target group’s status page (available from the EC2 console). Under the “Targets” tab, this usually lists instances that are part of the target group. If no traffic is getting through to the container instances, it will say something like “no healthy instances in this target group”. Any time you see this, when ECS is claiming that it is successfully starting tasks, you should take a look at the security group rules to make sure that you aren’t silently discarding all traffic from the load balancer to the application containers.

Part 3: Caching

Finally, we get to caching, which is what we mostly do at MemCachier. What we’re going to do in this final part of the tutorial is to create a MemCachier cache on AWS infrastructure in the same region as our application containers, to make the necessary code changes to our application to make use of the cache, to think about how to manage the secrets we need to use to connect to the cache, then to deploy our updated application. We’ll finish off by looking at some benchmarking results to prove that our cache is working.

The code for this part of the tutorial is in the thumbnailer-ecs-caching directory of the repository.

Using MemCachier for caching

MemCachier provides a fast and flexible multi-tenant cache system that’s compatible with the protcol used by the popular memcached software. When you create a cache with MemCachier, you’re provided with one or more endpoints that you can connect to using the memcached protocol, accessing your cache just as if you had set up your own memcached server. We’ve tried to make MemCachier as easy as possible to use. You need to go to https://www.memcachier.com and sign up for an account, then you can create a free development cache:

  1. On the welcome page after confirming your email address for your new MemCachier account, press the “Create A Cache” button.
  2. Choose a name for the cache (thumbnailer-cache, maybe?).
  3. Select “Amazon Web Services (EC2)” as the provider for the cache.
  4. Leave the cache plan as the default, a free 25 Mb development cache.
  5. Select the AWS region where you’ve set up your Django application from the dropdown.
  6. Press the “Create cache” button.

Once the cache is created, you can retrieve the server endpoint to use to connect to the cache, and the username and password to use for authentication from the cache details displayed on the “You Caches” page. Clicking on the “Analytics dashboard” button for the cache takes you to a page where you can view connection information and other statistics for your cache. (The statistics viewable for development caches are restricted, but production caches show constantly updating information about cache size usage, hit rate and eviction rates.)

Apart from better statistics management, the other main difference between development and paid production caches is that larger production caches provide multiple independent connection endpoints for load balancing and resiliency.

Caching in Django

To make use of a cache in Django, we need to add a cache backend. This goes into the settings.py. There are a number of options for cache backends, but we’re going to use the python-binary-memcached and django-bmemcached packages. The reason for choosing these packages is that, first of all, we need a package that supports the binary memcached protocol which includes SASL authentication. (We do support the ASCII version of the memcached protcol, but authentication there requires a non-standard extension to the protocol.) Second, we don’t really want to use the standard pylibmc package because it depends on an external C library. This isn’t a problem as such, but it would complicate building our Docker container image slightly, so we go with a pure Python solution instead.

After including the caching packages in our requirements.txt, we can add the following cache setup code to our settings.py:

Here, we extract the cache connection details from environment variables (MEMCACHIER_SERVER, MEMCACHIER_USERNAME and MEMCACHIER_PASSWORD). We’ll see how to get values into those environment variables on ECS in a moment. If have caching set up (detected by the presence of a valid MEMCACHIER_SERVER value), set Django’s CACHES configuration variable up to use the binary memcached protocol backend provided by the django-bmemcached package. To do this we need to supply the MemCachier endpoint and login credentials.

That’s all that’s needed to allow Django to connect to the MemCachier cache we created. There are a number of different ways that a Django web application can make use of a cache. The most common approaches are to cache whole rendered pages or fragments of pages, or to cache the results of database queries. Those options are less relevant to our thumbnailer application, so we’re going to demonstrate an approach using application-dependent custom caching. You can do more or less anything with this that you like – for example, I’ve used it in the past for caching the results of complicated permissions policy calculations to avoid recalculations every time permissions need to be checked.

To make use of the cache for storing the results of our thumbnail generation, we need to make only a few small changes to the views.py file in the main Django app. We add an import line at the top of the file to get access to the main Django cache:

then we modify the call to the thumbnail generation code to check for the presence of a pre-rendered thumbnail in the cache before calling the make_thumbnail function:

We construct a key for the cache entry from the thumbnail image size and the original image URL. If the rendered thumbnail is in the cache, we use that data directly. Otherwise we render the thumbnail using the make_thumbnail function as before, and store the result in the cache for later use.

What this means is that, until we fill our cache up, we only ever render any individual thumbnail once and after that always access the rendered thumbnail image data from the cache. Once our cache fills up, MemCachier will evict items in least recently used (LRU) order, but we don’t need to do anything special to manage that.

In more complex applications, we might want to set expiry times on our cached items, or explicitly delete items that we know are no longer valid, but for our thumbnailer, we just assume that image URLs are immutable and the thumbnails we produce are valid forever.

Managing secrets on ECS

Before we deploy our application with caching to ECS, we need to deal with the issue of secrets. We need some way to get values into the MEMCACHIER_SERVER, MEMCACHIER_USERNAME and MEMCACHIER_PASSWORD environment variables for our Django application to use to connect to our MemCachier cache. There are a few different ways to do this, but we’re going to demonstrate an approach that uses some AWS services, and that is both secure and more scalable and manageable than any ad hoc approach we might come up with ourselves.

The basic principle we’re going to follow here is that we want to decouple the management and deployment of code (our container image) from the management and access of secrets (login credentials for MemCachier, in this case, but the same approach applies for database credentials, encryption keys, and any other critical security data).

In general, you should not store security credentials with code: don’t check security keys into your GitHub repositories, don’t store them in container images, and so on. Security credentials should be held in a secure, audited store, ideally with an interface that makes it easy to manage access to credentials and to rotate them if there’s any sort of security leak (or just on a regular schedule, if your company has a policy on that in place).

We’re going to use two AWS services to do this here. The first is the Key Management Service (KMS), which is part of the IAM (Identity and Access Management) system. We’ll use this to create and manage an encryption key we can use to encrypt our secrets. The second is the Parameter Store service provided as part of the EC2 system, which gives us a secure key-value store for secret data that’s accessible via the AWS command-line interface.

Once we have these things set up, we’ll write an IAM policy that gives permissions to use the KMS key and to read parameters from our parameter store, and we’ll attach that policy to the IAM role that runs our ECS tasks. This policy+role approach is a pretty common way to manage access to AWS services for applications running on EC2 and/or ECS.

Creating an encryption key

Let’s start by creating an encryption key:

  1. Go to the IAM console and select “Encyption keys” from the left sidebar.
  2. Select the region that you’re using for your ECS deployment in the dropdown menu at the top of the key list that you see. (This user interface is different from the way that regions are treated in most of the rest of AWS, so be careful!) Once you have the right region selected, click on the “Create key” button.
  3. Give a name for the key (thumbnailer-key) and click on “Next step”.
  4. Click on “Next step” again to skip adding tags to the key.
  5. On the “Define Key Administrative Permissions”, give your personal IAM user administrative permissions by filling in the checkbox next to your username. Click on “Next step”.
  6. On the “Define Key Usage Permissions” page, fill in the checkbox next to your IAM user name, but also fill in the checkbox next to the ecsTaskExecutionRole IAM role. Click on “Next step”.
  7. Finally, review the IAM policy for using the key, and click on “Finish”.

What we’ve done here is to create a key that you can manage using your IAM user account, but that is also usable by the role that runs the tasks in your ECS service. For a real production deployment, you would create an application-specific role for this purpose and use it during the ECS setup process. (This is particularly true if your application needs to access other AWS services to do its job.)

After the new key has been created, click on the key name and make a note of the ARN (Amazon Resource Name) for the new key, which you’ll need in the IAM policy we’ll create in a minute.

Set up secret parameters

Now we can create some secrets in the EC2 Parameter Store:

  1. Go to the EC2 console and select “Parameter Store” from the left sidebar. (It’s close to the bottom.) Make sure you’re in the right region (this time you use the main region menu, at the top right of the page).
  2. Click on the “Create Parameter” button.
  3. Fill in the parameter name (thumbnailer.memcachier-server), select “Secure String” for the “Type”, then choose the name of the KMS key you just created to use to encrypt the parameter. Finally copy the MemCachier cache endpoint into the “Value” field, e.g. mc1.dev.eu.ec2.memcachier.com:11211. Click on “Create Parameter”.
  4. Repeat steps 2 and 3 for the MemCachier username (call the parameter thumbnailer.memcachier-username) and password (thumbnailer.memcachier-password).

If you have the AWS command line tools set up, you can test that the parameters are accessible by doing something like this in a terminal:

aws ssm get-parameters --name thumbnailer.memcachier-server --with-decryption --region eu-west-1

adjusting the region to match whatever region you’re using. (This will only work if you have your AWS credential set up so that the AWS CLI can run without asking for them – see here for how to set this up.)

Make secrets accessible to ECS task execution role

We now need to set things up so that the code running in our ECS containers can access the secrets we just created. To do this, we create an IAM policy that gives permissions to access the parameters and encryption key, and then add that policy to the IAM role that runs our ECS tasks.

To create the IAM policy:

  1. Go to the IAM console and click on “Policies” in the left sidebar.
  2. Click on “Create policy.”
  3. Click on the “JSON” tab to edit the raw JSON representation of the policy, and paste in the policy body from below, filling in the placeholders as appropriate. Click on “Review policy”.
  4. If there are no errors in the policy body, you’ll end up on a page where you can give a name to the policy (use thumbnailer-policy). Then click on “Create policy”.

Here is the JSON policy body to use – fill in the placeholders for your setup (<region> is the AWS region where your ECS setup is, <key-arn> is the ARN of the encryption key you created earlier, and <account-id> is your numeric AWS account ID, which you can see as the 5th components of the key ARN):

This policy has three statements in it, giving permissions to perform three different kinds of AWS operations. First, the ssm:DescribeParameters operation allows an IAM user or role to list all of the parameters defined by the user account they’re associated with (the * wildcard in the Resource value means “all values”); second, it allows the user/role to use the ssm:GetParameters operation to get the value of any parameters that match the pattern in the Resource part of the clause, i.e. any parameter that starts with “thumbnailer.”, which matches the parameters we defined; finally, the policy allows a user/role to use the encryption key we used to decrypt stored values – we need this to be able to access the plaintext values of the parameters from their values encrypted with our encryption key.

Now we attach the thumbnailer-policy policy to the ecsTaskExecutionRole IAM role:

  1. Go to the IAM console and click on “Roles” in the left sidebar.
  2. Click on ecsTaskExecutionRole in the list of roles.
  3. Under the “Permissions” tab, click on “Attach policy”.
  4. Select the thumbnailer-policy in the policy list. You can either type part of the name into the search box to filter the (long) policy list, or you can select “Customer managed” in the “Filter” dropdown to show only the policies that you’ve created yourself.
  5. Click “Attach policy”.

The ecsTaskExecutionRole now has the permissions defined in the thumbnailer-policy policy, so that the tasks running in our ECS service will be able to access the secrets we’ve defined.

Pick up secrets in start-up script

So, at this point:

  • we have secrets stored in the EC2 Parameter Store;
  • the role we’re using to run our ECS tasks has sufficient permissions to be able to access the parameters;
  • our Django code accesses the parameters from environment variables.

The last thing we need to do is to make the link between these, by extracting the secret parameter values from the EC2 Parameter Store and putting them into environment variables for Django to access. We’ll do this by modifying our start.sh container startup script. (An alternative approach would be to add some Python code to our Django application to access the secrets directly, using the boto3 AWS client library. We’re going to use the AWS CLI to do it inside our startup script here, just for a demonstration.)

We add the following code to the top of our startup script:

This uses the ECS task metadata endpoint to determine what AWS region we’re operating in, then uses the AWS command-line interface to get parameter values. The get-parameters sub-command is part of the ssm (Systems Management) command, and returns a JSON document that looks like this:

The get_param function in our script just pulls the value part out of this and we assign the results to the environment variables we need. To make this all work, we add the AWS command line interface to our requirements.txt (it’s a Python package) so that it’s included in our container image.

One thing to note about this is that we don’t need to worry about setting up any sort of AWS credentials for running the AWS command-line interface inside our container. When you run code inside ECS (or on a manually managed EC2 instance), you have an AWS identity of some sort, either implicitly, or as in this case, because of an explicit assignment of an IAM role. The permissions that you have to perform AWS actions are derived from that identity. In the case here, all the operations that we need to achieve are permissioned via the IAM policy that we set up earlier.

Updating our deployment

Now that we’ve updated our code, we can rebuild our container image:

and push it to the Docker Hub:

The only part of these steps that will take much time is installing the Python dependencies. The rest should be relatively quick, since most of the content of the container is already built and deployed to the Docker Hub.

We now need to deploy the new container image to ECS, and we need to force a turnover of the tasks running on ECS to use the new image. To do this, we go to the ECS console, find our thumbnailer-cluster cluster and open its cluster page, then select the “Services” tab. Select the thumbnailer-service service and click on “Update”. This reopens the service configuration wizard. We don’t need to make any changes at all to the configuration. All we need to do is to check the “Force new deployment” checkbox. Press the “Next step” buttons until you get to the final review page of the wizard, then hit the “Update Service” button.

Go to the service page for the thumbnailer-service service and you’ll see something interesting start to happen. Switch back and forth between the “Tasks” and “Deployments” tabs, and hit the refresh button now and then. You’ll see ECS provisioning and starting new tasks to replace the old ones, keeping track of the fact that the old tasks are associated with one deployment and the new tasks with a second deployment. The way we’ve set up the minimum and maximum task count percentages (which you can see on the “Deployments” tab), all of the new tasks will be started before the old tasks are retired. By modifying these percentages, you can trade off between using extra resources during deployments and reducing service availability. In our case, the maximum percentage is 200%, so there is no interruption of service during the deployment.

This push-button deployment and management of task turnover is a major advantage of ECS. If you’ve ever had to deal with that scarey moment of changeover during production deployments without this kind of support, you’ll appreciate that it can remove a lot of worry from doing production deployments.

If you take a look in the CloudWatch logs, you should see that the Django application reports that it is running with caching enabled, since it was able to pick up the MEMCACHIER_SERVER environment variable that we pulled from the EC2 Parameter Store.

So, now, if everything is working, thumbnails generated by our Django code will be cached. Try creating a couple of thumbnails via the web application and look at the times reported to create them. If you repeat a request for some thumbnail size and image URL, you’ll see that the second and subsequent requests are much quicker than the initial request, because the thumbnail data can be pulled straight out of the cache, instead of downloading the original image URL and generating the thumbnail anew.

Another thing to notice if you do this is that our application is still load balanced across three different servers, but the cache is shared between all the instances, so that if one server generates a thumbnail for a new image, and a later request for the same thumbnail is routed to a different server behind the load balancer, the result comes straight out of the cache.

Benchmarking results

We can get a more concrete idea of how much of a difference caching makes using the Node.js benchmarking code described earlier. Running this with TEST_URL set to point to the load balancer DNS name for our application, we end up with a results file (results.json) that contains sections like this:

This shows, for one image URL (https://www.memcachier.com/assets/logo.png) and thumbnail size (128), the results of running a number of thumbnailing requests (the number of requests for each URL and size is random, because of the way the benchmarking code works). The results for the requests are shown in the order they happened, and for each request you see the name of the server that serviced the request, the time (in milliseconds) taken on the server to generate the thumbnail data, and the total request time as seen by the client (from the point of pressing the “Thumbnail it!” button to the point where the result page has stabilised in the browser).

There are three main things to notice:

  1. The requests are load balanced across three different servers;
  2. Thumbnail generation in the first request takes 527 ms, while in subsequent requests, it takes on average about 40 ms, because the thumbnail data is being pulled out of the cache;
  3. The total request times are quite variable (see especially the second request, the first that used the cached result).

The variability of the total request times is pretty typical of code that runs on cloud services: there are a great many sources of unpredictability and latency between a client and our server code, including the network connection from the client to the cloud service, and sources of latency inside AWS (due to scheduling of our code on the EC2 instances used to run our ECS tasks, network queuing and so on).

Conclusions

We’ve covered a lot of ground in this tutorial. Some things to take away from it:

  • Deploying container-based applications on ECS takes a little bit of setup, but the payoff is good. We get load balancing, log aggregation, easy scalability and easy redeployments.
  • Adding caching to Django applications, even when we do custom caching, is pretty easy, and using MemCachier means we don’t need to worry about managing caching infrastructure ourselves.
  • There’s quite a big step between running your application in “developer” on your own machine and running it in production with resiliency, scalability, load balancing, and so on. It’s a good idea not to underestimate how much work it takes getting things production-ready!

Appendix: Cleaning up

To clean up all of the AWS resources associated with this tutorial, do the following:

  1. Delete the thumbnailer-lb load balancer, the thumbnailer-tg load balancer target group and the thumbnailer-lb-sg security group (from the EC2 console, follow the links in the left sidebar to the load balancers, target groups and security groups list, and select the right entities and delete them).
  2. Delete the thumbnailer-service service, the thumbnailer-sg security group, and then the thumbnailer-cluster cluster (press the “Delete Cluster” button on the cluster page in the ECS console). This will delete the CloudFormation stack associated with the cluster, removing all EC2 instances, CloudWatch log groups, and networking infrastructure associated with the cluster. It might take a couple of attempts to get it to delete everything, because the timeout on the deletion of the CloudFormation stack doesn’t seem quite long enough in some regions. It’s also possible that deleting the CloudFormation stack may fail completely, erroneously saying that the associated VPC still has dependencies. If that happens, you can delete the VPC manually and rerun the ECS cluster deletion to clean up.
  3. Go to the thumbnailer-task task definition from the ECS console and deregister the task definition revision we have there. To do this, just click on the thumbnailer-task name, then select thumbnailer-task:1 in the resulting list and choose “Deregister” from the “Actions” dropdown. Once you’ve done this, the thumbnailer-task task definition will no longer appear in the main task definition list.
  4. Delete the /ecs/thumbnailer log group from CloudWatch (follow the “Logs” link in the left sideback of the CloudWatch console).
  5. Delete the secret parameters from the EC2 Parameter Store and the thumbnailer-key encryption key from the IAM Key Management Service.

You might also want to remove the container images you uploaded to the Docker Hub, which you can do from the Docker Hub web interface.

Appendix: Prerequisites

Docker

There are also operating system packages for Docker on many platforms, but whether they’re up to date is a bit of a lottery. Rolling release distributions (Arch Linux, for example) will be up to date (I’m using Arch and have Docker v18 installed), but if you’re using something like Ubuntu, you might be better installing following the instructions on the Docker website here. You’ll want to install what the Docker website calls the “Community Edition”, which is free.

Once you have it installed, try doing docker info to see if it’s working.

Benchmarking prerequisites

To run the benchmarking code, you’ll need the following:

  • Firefox: available as a normal operating system package;
  • NPM and Node.js: available as a normal operating system package or from https://nodejs.org/ (I’m using NPM v6.1.0 and Node.js v10.4.1);
  • A relatively recent Java runtime (this is used by the Selenium browser automation system we use to drive Firefox): this should be available as a normal operating system package, but there are often lots of confusing options available. I’ve been using OpenJDK 8, which seems to work fine, and should be available on most platforms.

Appendix: AWS, the absolute basics

In a nutshell, what AWS offers is on-demand computing infrastructure. That includes compute resources, storage, networking, plus a lot of different options for managing everything. These options are divided into services. We’ll be using only a small number of those services in this tutorial. The sections below describe the most essential of these.

Elastic Compute Cloud (EC2)

EC2 is the core of AWS: it allows you to start virtual machine instances running on Amazon hardware in data centres around the world. AWS divides the world into regions (e.g. us-east-1 in North Virgina, eu-central-1 in Frankfurt, ap-northeast-1 in Tokyo, and so on), each of which has a number of availability zones (AZs) (physically separate data centres, more or less). You can start a virtual machine running in one of those AZs from any operating system image you like, and once it’s up and running, it looks like any other machine: if it’s set up right, you can SSH into it, run a web app on it, whatever you like.

EC2 instances come in different instance types, which vary in size, cost and specialisation, from small instances for experimentation and throw-away use (e.g. t2.micro, which is free for the first year you’re on AWS) up to large instance types optimised for in-memory databases, storage applications, GPU processing, etc. (for example, a c5d.18xlarge is a compute-optimised instance type with 72 CPUs and 144 Gb of memory).

You can start and manage EC2 instances yourself, but there are several AWS services to help: Elastic Beanstalk and CloudFormation are two of the more popular ones. (The EC2 service also deals with a couple of aspects of managing EC2 instances that don’t really fit into other services. In particular, load balancing is part of the EC2 service, which we’ll use a little.)

Identity and Access Management (IAM)

AWS has a complex and powerful policy-based permission system to control access to AWS resources. The IAM service manages all of this.

Once you have an AWS account, you can create IAM users with their own login credentials and restricted permissions limiting the AWS services and resources that those users are allowed to use (the original credentials you use to set up your AWS account act as “root” credentials that allow you to do anything in your account). IAM provides facilities for managing users in groups, attaching permissions policies to different users or groups, and so on.

One concept that IAM has that may not be familiar is the idea of roles. On Unix-based systems, we tend to use normal user accounts even for users that aren’t real people. For example, we have an lp user that runs the printing subsystem, a docker user that runs the Docker daemon, and so on. On AWS, instead of using “real” users for these “non-real” cases, we use roles. Like normal users, roles have permissions policies attached to them, controlling what AWS services they’re allowed to use. It’s possible to set EC2 instances up to start in a particular role (using something called an instance profile), which makes it easy to control what AWS services individual applications are allowed to use.

Virtual Private Cloud (VPC)

AWS provides virtualised networking infrastructure as well as virtualised compute resources. The main component of this that we’ll use is a virtual private cloud (VPC). This is a mechanism that essentially allows you to pretend that the AWS infrastructure that you use is isolated onto its own network, even though the machines you’re using are virtualised slices of larger machines, the network addresses you use are translated into a part of a larger internal AWS network address space, and so on.

When you create a VPC, you give a range of internal IP addresses to allocate to it (as a CIDR block, usually in the 10.0.0.0 private address range). A VPC then has a number of subnets created in it, each allocated a section of the VPC’s address range. When you launch EC2 instances, you specify which subnet they live in. The idea here is that a VPC is specific to a region, but subnets are specific to availability zones, meaning that you can create a virtual network that spans multiple data centre locations with very little effort: just create a VPC in the region that you want, create a couple of subnets in different AZs, then you can launch EC2 instances in the different subnets and have resilience to data centre problems.

Ingress and egress of network traffic from entities in a VPC are governed by security groups, sets of rules saying what types of TCP/IP traffic are allowed in or out of, for example, an EC2 instance. You can restrict incoming traffic by port or origin IP address, for example, so you could open up ports 80 and 443 (HTTP and HTTPS) to the world, while keeping port 22 (SSH) restricted to access from known machines that you control.