How to Write Dockerfiles for Python Web Apps

TL;DR

This post is filled with examples ranging from a simple Dockerfile to multistage production builds for Python apps. Here’s a quick summary of what this guide covers:

  • Using an appropriate base image (debian for dev, alpine for production).
  • Using gunicorn for hot reloading during development.
  • Optimising for Docker cache layers — placing commands in the right order so that pip install is executed only when necessary.
  • Serving static files (bundles generated via React/Vue/Angular) using flask static and template folders.
  • Using multi-stage alpine build to reduce final image size for production.
  • #Bonus — Using gunicorn’s --reload and --reload_extra_files for watching on changes to files (html, css and js included) during development.

If you’d like to jump right ahead to the code, check out the GitHub repo.

If you’d like to jump to git push and deploying on the Hasura platform: hasura/hello-python-flask

Contents

  1. Simple Dockerfile and .dockerignore
  2. Hot Reloading with gunicorn
  3. Running a single python script
  4. Serving Static Files
  5. Single Stage Production Build
  6. Multi Stage Production Build

Let’s assume a simple directory structure. The application is called python-app. The top level directory has a Dockerfile and an src folder.

The source code of your python app will be in thesrc folder. It also contains the app dependencies in requirements.txt file. For brevity, let’s assume that server.py defines a flask server running on port 8080.

python-app
├── Dockerfile
└── src
    └── server.py
    └── requirements.txt

1. Simple Dockerfile Example

For the base image, we have used the latest version python:3.6

During image build, docker takes all files in the context directory. To increase the docker build’s performance, exclude files and directories by adding a .dockerignore file to the context directory.

Typically, your .dockerignore file should be:

.git
__pycache__
*.pyc
*.pyo
*.pyd
.Python
env

Build and run this image:

$ cd python-docker
$ docker build -t python-docker-dev .
$ docker run --rm -it -p 8080:8080 python-docker-dev

The app will be available at http://localhost:8080. Use Ctrl+C to quit.

Now let’s say you want this to work every time you change your code. i.e. local development. Then you would mount the source code files into the container for starting and stopping the python server.

$ docker run --rm -it -p 8080:8080 -v $(pwd):/app \
             python-docker-dev bash
[email protected]:/app# python src/server.py

2. Hot Reloading with Gunicorn

gunicorn is a Python WSGI HTTP Server for UNIX and a pre-fork worker model. You can configure gunicorn to make use of multiple options. You can pass on --reload to the gunicorn command or place it in the configuration file. If any files change, gunicorn will automatically restart your python server.

We’ll build the image and run gunicorn so that the code is rebuilt whenever there is any change inside the app directory.

$ cd python-docker
$ docker build -t python-hot-reload-docker .
$ docker run --rm -it -p 8080:8080 -v $(pwd):/app \
             python-hot-reload-docker bash
[email protected]:/app# gunicorn --config ./gunicorn_app/conf/gunicorn_config.py gunicorn_app:app

All edits to python files in theappdirectory will trigger a rebuild and changes will be available live at http://localhost:8080. Note that we have mounted the files into the container so that gunicorn can actually work.

What about other files? If you have other types of files (templates, views etc) that you want gunicorn to watch for code changes, you can specify the file types in reload_extra_files argument. It accepts an array of files.

3. Running a single python script

For simple single file scripts, you can run the python script using the python image with docker run.

docker run -it --rm --name single-python-script -v "$PWD":/app -w /app python:3 python your-daemon-or-script.py

You can also pass some arguments to your python script. In the above example, we have mounted the current working directory, which also allows files in that directory to be passed as arguments.

4. Serving Static Files

The above Dockerfile assumed that you are running an API server with Python. Let’s say you want to serve your React.js/Vue.js/Angular.js app using Python. Flask provides a quick way to render static files. Your html should be present inside the templates folder and your css, js, images should be present inside the static folder.

Check out a sample hello world static app structure here.

In your server.py,

if __name__ == '__main__':
    app.run(host='0.0.0.0')

Note the host, 0.0.0.0 — this allows your container to be accessible from outside. By default, if you don’t give a host, it binds only to the localhost interface.

5. Single Stage Production Build

Build and run the all-in-one image:

$ cd python-docker
$ docker build -t python-docker-prod .
$ docker run --rm -it -p 8080:8080 python-docker-prod

The image built will be ~700MB (depending on your source code), due to the underlying Debian layer. Let’s see how we can cut this down.

6. Multi Stage Production Build

With multi stage builds, you use multiple FROM statements in your Dockerfile but the final build stage will be the one used, which will ideally be a tiny production image with only the exact dependencies required for a production server.

This will be really useful when you are using system dependent modules or ones that requires compiling etc. pycrypto, numpy are good examples of this type.

With the above, the image built with Alpine comes to around ~90MB, a 8X reduction in size. The alpine variant is usually a very safe choice to reduce image sizes.

Note: All of the above Dockerfiles were written with python 3 — the same can be replicated for python 2 with little to no changes. If you are looking to deploy your django app, you should be able to deploy a production ready app with minimal tweaking to the Dockerfiles above.

Any suggestions to improve the ideas above? Any other use-cases that you’d like to see? Do let me know in the comments.

Join the discussion on Reddit and HackerNews :)


Hasura is an open-source engine that gives you realtime GraphQL APIs on new or existing Postgres databases, with built-in support for stitching custom GraphQL APIs and triggering webhooks on database changes.