The Case for Docker

Docker provides significant benefits to developers as part of the application development workflow. Containers are used to isolate applications and services, without the overhead seen in alternative solutions, namely virtual machines. Without Docker, a development environment is often a complex beast of platform-biased binaries, programming language version managers, and shell script hacks to keep environments and applications isolated. Further, upgrading the underlying operating system always introduces the fear that one’s carefully crafted development environment might break.

Virtual machines are a step above of this uncertainty but consume significant system resources and, often, storage space, which is precious in an SSD-backed development box. VM’s still play a prominent role in the world of Docker, especially as a Docker host on platforms that don’t support Linux containers natively (e.g. OSX).

Docker also brings immense reproducibility and portability benefits to the table, which I’ll focus on in a future post. Docker promotes precise consistency across environments.

Avoiding Docker Bloat

When I first started using Docker in Ruby on Rails projects, my images would often end up being more than a gigabyte in size. Every application dependency, library, and Ruby gem would be packaged into the Docker image. With one application, this might not seem like a big deal, but with several, you’re most likely reproducing files in more than one spot and churning through disk space. Further, running commands like “bundle install” will be abysmally slow for larger apps.

docker-isolated-gems
Figure 1: Docker containers with their own, isolated set of dependencies.

Prior to using Docker, my Ruby gems were centralized in a location such as ~/.rbenv/versions/2.1.2/, so if I updated my Gemfile or started work on another application, running bundler install would only require a minute of patience or less. So we need to figure out how to apply the same concept here.

Data Volume Containers

Docker has a brilliant feature – the data volume container – that allows you to mount a directory in a container and then share it with other containers. We can now reproduce the equivalent of ~/.rbenv/versions/2.1.2/ but gain much more in the process.

docker-shared-gems
Figure 2: A data volume container can be referenced by other containers.

The data volume, /ruby_gems/2.1, in the gems-2.1 container is mounted as a data volume in each of the application containers.

Example

First, create an empty, minimal container, with a data volume mounted at /ruby_gems/2.1. Notice that for our data volume needs, we only need to create, not run, the container.

docker create -v /ruby_gems/2.1 --name gems-2.1 busybox

The busybox image is one of the smallest in the Docker registry, weighing in at less than 3 MB. You are going to use this container only for storing files, so nothing fancy is needed.

Next, build an image that will be used for your application container.

docker build -t my_project_name .

Here’s a sample Dockerfile that can be used with the above. The following line is important:

ENV GEM_HOME /ruby_gems/2.1

When you bundle install, gems will be written to that path.

Also notice I didn’t embed any bundler commands in the Dockerfile. There’s a good reason. Data volumes can only be exposed when running containers, not when building images. This means that the output of any RUN instruction in your Dockerfile will be added to the image, which is what we’re trying to avoid when installing gems.

Finally, I can create an application container and reference the exposed volume from the gems-2.1 container. My app container calls a “start” script which bundles the gems, but since /ruby_gems/2.1 is mounted to the gems-2.1 container, all of the artifacts are persisted there.

docker run -d --volumes-from gems-2.1 my_project_name

Using Docker Compose

Docker Compose (formerly “Fig”) helps you manage multi-container applications. We can use the above technique with Compose too. A gotcha is that it doesn’t support global namespaces yet, so we’ll still need to spin up the gems container manually:

docker create -v /ruby_gems/2.1 --name gems-2.1 busybox

or, if you’ve previously created the container but it’s in a stopped state:

docker run gems-2.1

A sample docker-compose.yml might look like this:

db:
  image: wyaeld/postgres:9.3
  ports:
  - "5432"
 
redis:
  image: mini/redis
  ports:
  - "6379"
 
dev:
  build: .
  command: ./script/start
  links:
  - db
  - redis
  ports:
  - "3000:3000"
  volumes:
  - .:/app
  volumes_from:
  - gems-2.1

Why Not Mount Data Volumes to the Host?

Some of you might be wondering why a data volume container? Couldn’t we expose a data volume from the Docker host and centralize everything there? Example:

dev:
  build: .
  command: ./script/start
  links:
  - db
  - redis
  ports:
  - "3000:3000"
  volumes:
  - .:/app
  - /ruby_gems/2.1:/ruby_gems/2.1

The double, “/ruby_gems/2.1″, mounts the named path from the Docker host to the container and is identical to passing “-v /ruby_gems/2.1:/ruby_gems/2.1″ to the Docker run command. In other words, anything written to /ruby_gems/2.1 appears on your local machine. Note, the path names are arbitrary and don’t need to be identical.

The primary reason for not doing this is portability. Leaving gems on the Docker host makes it impractical for moving the files to other hosts and environments. With the container model, on the other hand, we can actually export the gems container as a tar file:

docker run --volumes-from gems-2.1 -v $(pwd):/backup ubuntu tar cf /backup/gems-2.1.tar ruby_gems

Let’s dissect the above command. We’re running a command through the “ubuntu” image (which can be any image that has tar installed). We are also attaching the data volume from the “gems-2.1″ container, then mounting a host directory called backup. We then tar up the contents of the ruby_gems directory (from the gems-2.1 container).

Another way to get files out of a container:

docker cp gems-2.1:/ruby_gems local_gems_dir

The previous command does not tar the contents, so you’ll have to handle that manually. But tar support is being added per this pull request (not yet released as of 3/10/15), which will produce a tar file via:

docker cp gems-2.1:/ruby_gems -

Either technique helps us ship application dependencies as containers. In another environment, you can import the tar file and launch a new container:

cat gems-2.1.tar | docker import - gems-2.1

In OSX development environments, one issue with locally mounted data volumes is performance when data is shared with the host system. This is because the boot2docker virtual machine relies on VirtualBox’s much slower shared folder implementation. There are ways to get around this, such as using Docker Machine to point to a Vagrant powered version of boot2docker – I’ll discuss in a future post.

In closing, using a data volume container to share files between containers helps conserve storage space and provides portability benefits. My specific use case revolved around Ruby gems, but I hope this article has inspired you to think about other applications.

Since 2013, Healthcare Blocks has been powering healthcare technology companies by providing a secure, HIPAA-compliant platform to deploy applications.  We are healthcare veterans and have great empathy for the regulatory hurdles that healthcare technology companies must navigate. Our specially-designed technology automates and enforces many of the requirements mandated under the Health Insurance Portability and Accountability Act (HIPAA), enabling developers and engineers to focus on features and functionality, rather than regulatory compliance.

We started our business to provide the foundation for healthcare innovation, making the world a better, healthier place, one solution at a time. We want people to be happier and healthier, and we’re confident you do too. Don’t let HIPAA slow you down. If you have any comments or questions regarding HIPAA, please contact us.