The Case for Docker
Docker provides significant benefits to developers as part of the application development workflow. Containers are used to isolate applications and services, without the overhead seen in alternative solutions, namely virtual machines. Without Docker, a development environment is often a complex beast of platform-biased binaries, programming language version managers, and shell script hacks to keep environments and applications isolated. Further, upgrading the underlying operating system always introduces the fear that one’s carefully crafted development environment might break.
Virtual machines are a step above of this uncertainty but consume significant system resources and, often, storage space, which is precious in an SSD-backed development box. VM’s still play a prominent role in the world of Docker, especially as a Docker host on platforms that don’t support Linux containers natively (e.g. OSX).
Docker also brings immense reproducibility and portability benefits to the table, which I’ll focus on in a future post. Docker promotes precise consistency across environments.
Avoiding Docker Bloat
When I first started using Docker in Ruby on Rails projects, my images would often end up being more than a gigabyte in size. Every application dependency, library, and Ruby gem would be packaged into the Docker image. With one application, this might not seem like a big deal, but with several, you’re most likely reproducing files in more than one spot and churning through disk space. Further, running commands like “bundle install” will be abysmally slow for larger apps.
Prior to using Docker, my Ruby gems were centralized in a location such as ~/.rbenv/versions/2.1.2/, so if I updated my Gemfile or started work on another application, running bundler install would only require a minute of patience or less. So we need to figure out how to apply the same concept here.
Data Volume Containers
Docker has a brilliant feature – the data volume container – that allows you to mount a directory in a container and then share it with other containers. We can now reproduce the equivalent of ~/.rbenv/versions/2.1.2/ but gain much more in the process.
The data volume, /ruby_gems/2.1, in the gems-2.1 container is mounted as a data volume in each of the application containers.
First, create an empty, minimal container, with a data volume mounted at /ruby_gems/2.1. Notice that for our data volume needs, we only need to create, not run, the container.
docker create -v /ruby_gems/2.1 --name gems-2.1 busybox
The busybox image is one of the smallest in the Docker registry, weighing in at less than 3 MB. You are going to use this container only for storing files, so nothing fancy is needed.
Next, build an image that will be used for your application container.
docker build -t my_project_name .
Here’s a sample Dockerfile that can be used with the above. The following line is important:
ENV GEM_HOME /ruby_gems/2.1
When you bundle install, gems will be written to that path.
Also notice I didn’t embed any bundler commands in the Dockerfile. There’s a good reason. Data volumes can only be exposed when running containers, not when building images. This means that the output of any RUN instruction in your Dockerfile will be added to the image, which is what we’re trying to avoid when installing gems.
Finally, I can create an application container and reference the exposed volume from the gems-2.1 container. My app container calls a “start” script which bundles the gems, but since /ruby_gems/2.1 is mounted to the gems-2.1 container, all of the artifacts are persisted there.
docker run -d --volumes-from gems-2.1 my_project_name
Using Docker Compose
Docker Compose (formerly “Fig”) helps you manage multi-container applications. We can use the above technique with Compose too. A gotcha is that it doesn’t support global namespaces yet, so we’ll still need to spin up the gems container manually:
docker create -v /ruby_gems/2.1 --name gems-2.1 busybox
or, if you’ve previously created the container but it’s in a stopped state:
docker run gems-2.1
A sample docker-compose.yml might look like this:
db: image: wyaeld/postgres:9.3 ports: - "5432" redis: image: mini/redis ports: - "6379" dev: build: . command: ./script/start links: - db - redis ports: - "3000:3000" volumes: - .:/app volumes_from: - gems-2.1
Why Not Mount Data Volumes to the Host?
Some of you might be wondering why a data volume container? Couldn’t we expose a data volume from the Docker host and centralize everything there? Example:
dev: build: . command: ./script/start links: - db - redis ports: - "3000:3000" volumes: - .:/app - /ruby_gems/2.1:/ruby_gems/2.1
The double, “/ruby_gems/2.1″, mounts the named path from the Docker host to the container and is identical to passing “-v /ruby_gems/2.1:/ruby_gems/2.1″ to the Docker run command. In other words, anything written to /ruby_gems/2.1 appears on your local machine. Note, the path names are arbitrary and don’t need to be identical.
The primary reason for not doing this is portability. Leaving gems on the Docker host makes it impractical for moving the files to other hosts and environments. With the container model, on the other hand, we can actually export the gems container as a tar file:
docker run --volumes-from gems-2.1 -v $(pwd):/backup ubuntu tar cf /backup/gems-2.1.tar ruby_gems
Let’s dissect the above command. We’re running a command through the “ubuntu” image (which can be any image that has tar installed). We are also attaching the data volume from the “gems-2.1″ container, then mounting a host directory called backup. We then tar up the contents of the ruby_gems directory (from the gems-2.1 container).
Another way to get files out of a container:
docker cp gems-2.1:/ruby_gems local_gems_dir
The previous command does not tar the contents, so you’ll have to handle that manually. But tar support is being added per this pull request (not yet released as of 3/10/15), which will produce a tar file via:
docker cp gems-2.1:/ruby_gems -
Either technique helps us ship application dependencies as containers. In another environment, you can import the tar file and launch a new container:
cat gems-2.1.tar | docker import - gems-2.1
In OSX development environments, one issue with locally mounted data volumes is performance when data is shared with the host system. This is because the boot2docker virtual machine relies on VirtualBox’s much slower shared folder implementation. There are ways to get around this, such as using Docker Machine to point to a Vagrant powered version of boot2docker – I’ll discuss in a future post.
In closing, using a data volume container to share files between containers helps conserve storage space and provides portability benefits. My specific use case revolved around Ruby gems, but I hope this article has inspired you to think about other applications.