In the previous chapter, we learned how to build a Docker image and the very basic steps required for running the resulting image within a container. In this chapter, we’ll first take a look at where containers came from and then dive deeper into containers and the Docker commands that control the overall configuration, resources, and privileges that your container receives.
You might be familiar with virtualization systems like VMware or Xen that allow you to run a complete Linux kernel and operating system on top of a virtualized layer, commonly called a hypervisor. This approach provides very strong isolation between virtual machines because each hosted kernel sits in separate memory space and has defined entry points into the actual hardware, either through another kernel or something that looks like hardware.
Containers are a fundamentally different approach where all containers share a single kernel and isolation is implemented entirely within that single kernel. This is called operating system virtualization. The libcontainer project gives a good, short definition of a container: “A container is a self-contained execution environment that shares the kernel of the host system and which is () isolated from other containers in the system.” The major advantages are around efficiency of resources because you don’t need a whole operating system for each isolated function. Since you are sharing a kernel, there is one less layer of indirection between the isolated task and the real hardware underneath. When a process is running inside a container, there is only a very little shim that sits inside the kernel rather than potentially calling up into a whole second kernel while bouncing in and out of privileged mode on the processor.
Unlike hardware virtualization like that provided by VMware, for example, Windows applications cannot run inside a Linux container. So containers are best thought of as a Linux technology where, at least for now, you can run any of your favorite Linux applications or servers. When thinking of containers, you should try very hard to throw out what you might already know about virtual machines and instead conceptualize a container as a wrapper around a process that actually runs on the server.
It is often the case that a revolutionary technology is an older technology that has finally arrived in the spotlight. Technology goes in waves, and some of the ideas from the 1960s are back in vogue. Similarly, Docker is a new technology and it has an ease of use that has made it an instant hit, but it doesn’t exist in a vacuum. Much of what underpins Docker comes from work done over the last 30 years in a few different arenas: from a system call added to the Unix kernel in the late 1970s, to tooling built on modern Linux. It’s worth a quick tour through how we got to Docker because understanding that helps you place it within the context of other things you might be familiar with.
Containers are not a new idea. They are a way to isolate and encapsulate a part of the running system. The oldest technology in that area were the first batch processing systems. You’d run a program for a while, then switch to run another program. There was isolation: you could make sure your program didn’t step on anyone else’s program. That’s all pretty crude now, but it’s the very first step on the road to Linux containers and Docker.
Most people would argue that the seeds for today’s containers were planted in 1979 with the addition of the chroot system call to Version 7 Unix.chroot restricts a process’s view of the underlying filesystem. The chroot system call is commonly used to protect the operating system from untrusted server processes like FTP, BIND, and Sendmail, which are publicly exposed and susceptible to compromise.
In the 1980s and 1990s, various Unix variants were created with mandatory access controls for security reasons. This meant you had tightly controlled domains running on the same Unix kernel. Processes in each domain had an extremely limited view of the system that precluded them from interacting across domains. A popular commercial version of Unix that implemented this idea was the Sidewinder firewall built on top of BSDI Unix. But this was not possible in most mainstream Unix implementations.
That changed in 2000 when FreeBSD 4.0 was released with a new command, called jail, which was designed to allow shared-environment hosting providers to easily and securely create a separation between their processes and those of their individual customers. FreeBSD jail expandedchroot’s capabilities, but restricted everything a process could do with the underlying system and processes in other jails.
In 2004, Sun released an early build of Solaris 10, which included Solaris Containers, and later evolved into Solaris Zones. This was the first major commercial implementation of container technology and is still used today to support many commercial container implementations. In 2007, HP released Secure Resource Partitions for HPUX, later renamed to HP-UX Containers; and finally, in 2008, Linux Containers (LXC) were released in version 2.6.24 of the Linux kernel. The phenomenal growth of Linux Containers across the community did not really start to grow until 2013 with the inclusion of user namespaces in version 3.8 of the Linux Kernel and the release of Docker one month later.
Companies that had to deal with scaling applications to the size of the Internet, with Google being a very early example, started pushing container technology in the early 2000s in order to facilitate distributing their applications across data centers full of computers. A few companies maintained their own patched kernels with container support for internal use. Google contributed some of its work to support containers into the mainline Linux kernel, as understanding about the broader need for these features began to increase in the Linux community.
In late 2013, months after the Docker announcement, Google released lmctfy, the open source version of the internal container engine it had been running for some years. By this time, Docker was already widely discussed in the press. It was the right combination of ease of use and enabling technology just at the right time. Other promising container engines, like CoreOS Rocket, have been released since, but Docker seems to have built up a head of steam that is currently powering it to the forefront.
So far we’ve started containers using the handy command. But is really a convenience command that wraps two separate steps into one. The first thing it does is create a container from the underlying image. This is accomplished separately using the command. The second thing docker run does is execute the container, which we can also do separately with the command.
Now let’s take a look at some of the ways we can tell Docker to configure our container when we create it.
When you create a container, it is built from the underlying image, but various command-line arguments can affect the final settings. Settings specified in the Dockerfile are always used as defaults, but you can override many of them at creation time.
By default, Docker randomly names your container by combining an adjective with the name of a famous person. This results in names likeecstatic-babbage and serenealbattani. If you want to give your container a specific name, you can do so using the argument.
As mentioned in Chapter 4, labels are key-value pairs that can be applied to Docker images and containers as metadata. When new Docker containers are created, they automatically inherit all the labels from their parent image.
It is also possible to add new labels to the containers so that you can apply metadata that might be specific to that single container.
You can then search for and filter containers based on this metadata, using commands like .
You can use the command on the container to see all the labels that a container has.
By default, when you start a container, Docker will copy certain system files on the host, including /etc/hostname, into the container’s configuration directory on the host,2 and then use a bind mount to link that copy of the file into the container. We can launch a default container with no special configuration like this:
Since we want to be able to interact with the container that we are going to create for demonstration purposes, we pass in a few useful arguments. The argument tells Docker to delete the container when it exits, the argument tells Docker to allocate a psuedo-TTY, and the argument tells Docker that this is going to be an interactive session, and we want to keep STDIN open. The final argument in the command is the exectuable that we want to run within the container, which in this case is the ever useful /bin/bash.
If we now run the command from within the resulting container, we will see something similar to this:
While the device number will be different for each container, the part we care about is that the mount point is . This links the container’s to the hostname file that Docker has prepared for the container, which by default contains the container’s ID and is not fully qualified with a domain name.
We can check this in the container by running the following:
To set the hostname specifically, we can use the argument to pass in a more specific value.
Then, from within the container, we will see that the fully-qualified hostname is defined as requested.
Just like , the file is managed via a bind mount between the host and container. By default, this is an exact copy of the Docker host’s resolv.conf file. If we didn’t want this, we could use a combination of the and arguments to override this behavior in the container:
Within the container, it now looks like this:
Another important piece of information that you can configure is the MAC address for the container. Without any configuration, a container will receive a calculated MAC address that starts with the 02:42:ac:11 prefix. If you need to specifically set this to a value, you can do this by running something similar to this:
Normally you will not need to do that. But sometimes you want to reserve a particular set of MAC addresses for your containers in order to avoid other virtualization layers that use the same private block as Docker.
There are times when the default disk space allocated to a container or its ephemeral nature is not appropriate for the job at hand and it is necessary to have storage that can persist between container deployments.
For the times when we need to do this, we can leverage the command to mount filesystems from the host server into the container. In the following example, we are mounting to within the container:
In the mount options, we can see that the filesystem was mounted read-write on as we expected.
If the container application is designed to write into , then this data will be visible on the host filesystem in and would remain available when this container was stopped and a new container started with the same volume mounted.
In Docker 1.5, a new command was added that allows the root volume of your container to be mounted read-only so that processes within the container cannot write anything to the root filesystem. This prevents things like logfiles, which a developer was unaware of, from filling up the container’s allocated disk in production. When used in conjunction with a mounted volume, you can ensure that data is only written into expected locations.
In our previous example, we could accomplish this by simply adding to the command.
If we look closely at the mount options for the root directory, we will notice that they are mounted with the ro option, which makes it read-only. However, the mount is still mounted with the rw option so that our application can successfully write to the one volume to which we have designed it to write.
When people discuss the types of problems that you must often cope with when working in the cloud, the concept of the “noisy neighbor” is often near the top of the list. The basic problem this term refers to is that other applications, running on the same physical system as yours, can have a noticeable impact on your performance and resource availability.
Traditional virtual machines have the advantage that you can easily and very tightly control how much memory and CPU, among other resources, are allocated to the virtual machine. When using Docker, you must instead leverage the cgroup functionality in the Linux kernel to control the resources that are available to a Docker container. The docker create command directly supports configuring CPU and memory restrictions when you create a container.
There is an important caveat here. While Docker supports CPU and memory limits, as well as swap limits, you must have these capabilities enabled in your kernel in order for Docker to take advantage of them. You might need to add these as command-line parameters to your kernel on startup. To figure out if your kernel supports these limits, run docker info. If you are missing any support, you will get warning messages at the bottom, like:
Docker thinks of CPU in terms of “cpu shares.” The computing power of all the CPU cores in a system is considered to be the full pool of shares. 1024 is the number that Docker assigns to represent the full pool. By configuring a container’s CPU shares, you can dictate how much time the container gets to use the CPU for. If you want the container to be able to use at most half of the computing power of the system, then you would allocate it 512 shares. Note that these are not exclusive shares, meaning that assigning all 1024 shares to a container does not prevent all other containers from running. Rather it’s a hint to the scheduler about how long each container should be able to run each time it’s scheduled. If we have one container that is allocated 1024 shares () and two that are allocated 512, they will all get scheduled the same number of times. But if the normal amount of CPU time for each process is 100 microseconds, the containers with 512 shares will run for 50 microseconds each time, whereas the container with 1024 shares will run for 100 microseconds.
Let’s explore a little bit how this works in practice. For the following examples, we are going to use a new Docker image that contains the stress command for pushing a system to its limits.
When we run stress without any cgroup constraints, it will use as many resources as we tell it to. The following command creates a load average of around 5 by creating two CPU-bound processes, one I/O-bound process, and two memory allocation processes:
If you run the top command on the Docker host, near the end of the two-minute run, you can see how the system is affected by the load created by the stress program.
If you want run the exact same stress command again, with only half the amount of available CPU time, you can run it like this:
The is the flag that does the magic, allocating 512 CPU shares to this container. Note that the effect might not be noticeable on a system that is not very busy. That’s because the container will continue to be scheduled for the same time-slice length whenever it has work to do, unless the system is constrained for resources. So in our case, the results of a top command on the host system will likely look exactly the same, unless you run a few more containers to give the CPU something else to do.
It is also possible to pin a container to one or more CPU cores. This means that work for this container will only be scheduled on the cores that have been assigned to this container.
In the following example, we are running our stress container pinned to the first of two CPUs, with 512 CPU shares. Note that everything following the container image here are parameters to the stress command, not docker.
If we run top again, we should notice that the percentage of CPU time spent in user space () is lower than it previously was, since we have restricted two CPU-bound processes to a single CPU.
We can control how much memory a container can access in a manner similar to constraining the CPU. There is, however, one fundamental difference: while constraining the CPU only impacts the application’s priority for CPU time, the memory limit is a hard limit. Even on an unconstrained system with 96 GB of free memory, if we tell a container that it may only have access to 24 GB, then it will only ever get to use 24 GB regardless of the free memory on the system. Because of the way the virtual memory system works on Linux, it’s possible to allocate more memory to a container than the system has actual RAM. In this case, the container will resort to using swap in the event that actual memory is not available, just like a normal Linux process.
Let’s start a container with a memory constraint by passing the option to the command:
When you use the option alone, you are setting both the amount of RAM and the amount of swap that the container will have access to. So here we’ve constrained the container to 512 MB of RAM and 512 MB of additional swap space. Docker supports b, k, m, or g, representing bytes, kilobytes, megabytes, or gigabytes, respectively.
If you would like to set the swap separately or disable it altogether, then you need to also use the option. The option defines the total amount of memory and swap available to the container. If we rerun our previous command, like so:
Then we are telling the kernel that this container can have access to 512 MB of memory and 256 MB of additional swap space. Setting the option to -1 will disable the swap completely within the container.
So, what happens if a container reaches its memory limit? Well, let’s give it a try by modifying one of our previous commands and lowering the memory significantly:
Where all our other runs of the stress container ended with the line:
We see that this run quickly fails with the line:
This is because the container tries to allocate more memory than it is allowed, and the Linux Out of Memory () killer is invoked and starts killing processes within the cgroup to reclaim memory. Since our container has only one running process, this kills the container.
Another common way to limit resources avaliable to a process in Unix is through the application of user limits. The following code is a list of the types of things that can usually be configured by setting soft and hard limits via the ulimit command:
You can then override these ulimits on a specific container by passing in values using the argument.
There are some additional advanced commands that can be used when creating containers, but this covers many of the more common use cases.The Docker client documentation lists all the available options and is kept current with each Docker release.
Earlier in the chapter we used the docker create command to create our container. When we are ready to start the container, we can use thedocker start command. Let’s say that we needed to run a copy of Redis, a common key-value store. We won’t really do anything with this Redis container, but it’s a long-lived process and serves as an example of something we might do in a real environment. We could first create the container using a command like the one shown here:
The command ends with the full hash that was generated for the container. However, if we didn’t know the full or short hash for the container, we could list all the containers on the system, whether they are running or not, using:
We can then start the container with the following command:
To verify that it’s running, we can run:
In many cases, we want our containers to restart if they exit. Some containers are just very short-lived and come and go quickly. But for production applications, for instance, you expect them to be up after you’ve told them to run. We can tell Docker to do that on our behalf.
The way we tell Docker to do that is by passing the argument to the docker run command. It takes three values: no, always, or on-failure:#. If restart is set to no, the container will never restart if it exits. If it is set to always, then the container will restart whenever the container exits with no regard to the exit code. If restart is set to on-failure:3, then whenever the container exits with a nonzero exit code, Docker will try to restart the container three times before giving up.
We can see this in action by rerunning our last memory-constrained stress container without the argument, but with the argument.
In this example, we will see the output from the first run appear on the console before it dies. If we run a docker ps immediately after the container dies, we will see that Docker is attempting to restart the container.
It will continue to fail because we have not given it enough memory to function properly. After five attempts, Docker will give up and we will see the container disappear from the the output of docker ps.
Containers can be stopped and started at will. You might think that starting and stopping are analogous to pausing and resuming a normal process. It’s not quite the same, though. When stopped, the process is not paused; it actually exits. And when a container is stopped, it no longer shows up in the normal docker ps output. On reboot, docker will attempt to start all of the containers that were running at shutdown. It uses this same mechanism, and it’s also useful when testing or for restarting a failed container. We can simply pause a Docker container with docker pause and unpause, discussed later. But let’s stop our container now:
Now that we have stopped the container, nothing is in the ps list! We can start it back up with the container ID, but it would be really inconvenient to have to remember that. So docker ps has an additional option () to show all containers, not just the running ones.
That STATUS field now shows that our container exited with a status code of 0 (). We can start it back up with all of the same configuration it had before:
We keep talking about the idea that containers are just a tree of processes that interact with the system in essentially the same was as any other process on the server. That means that we can send them Unix signals, which they can respond to. In the previous docker stop example, we’re sending the container a SIGTERM signal and waiting for the container to exit gracefully. Containers follow the same process group signal propagation that any other process group would receive on Linux.
A normal docker stop sends a normal SIGTERM signal to the process. If you want to force a container to be killed if it hasn’t stopped after a certain amount of time, you can use the argument, like this:
This tells Docker to initially send a SIGTERM signal as before, but then if the container has not stopped within 25 seconds, to send a SIGKILLsignal to forcefully kill it. Although stop is the best way to shut down your containers, there are times when it doesn’t work and we need to forcefully kill a container.
We saw what it looks like to use docker stop to stop a container, but often if a process is misbehaving, you just want it to exit immediately. We have docker kill for that. It looks pretty much like docker stop:
A docker ps nows shows that the container is no longer running, as expected. Just because it was killed rather than stopped does not mean you can’t start it again, though. You can just issue a docker start like you would for a nicely stopped container. Sometimes you might want to send another signal to a container, one that is not stop or kill. Like the Linux kill command, docker kill supports sending any Unix signal. Let’s say we wanted to send a USR1 signal to our container to tell it to do something like reconnect a remote logging session. We could do the following:
If our container actually did something with the USR1 signal, it would now do it. Since we’re just running a bash shell, though, it just continues on as if nothing happened. Try sending a HUP signal, though, and see what happens. Remember that a HUP is the signal that is sent when the terminal closes on a foreground process.
Sometimes we really just want to stop our container as described above. But there are a number of times when we just don’t want our container to do anything for a while. That could be because we’re taking a snapshot of its filesystem to create a new image, or just because we need some CPU on the host for a while. If you’re used to normal Unix process handling, you might wonder how this actually works since containerized processes are just processes.
Pausing leverages the cgroups freezer, which essentially just prevents your process from being scheduled until you unfreeze it. This will prevent the container from doing anything while maintaining its overall state, including memory contents. Unlike stopping a container, where the processes are made aware that they are stopping via the SIGSTOP signal, pausing a container doesn’t send any information to the container about its state change. That’s an important distinction. Several Docker commands use pausing and unpausing internally as well. Here’s how we pause a container:
If we look at the list of running containers, we will now see that the Redis container status is listed as ().
Attempting to use the container in this paused state would fail. It’s present, but nothing is running. We can now resume the container using thedocker unpause command.
After running all these commands to build images, create containers, and run them, we have accumulated a lot of image layers and container folders on our system. We can list all the containers on our system using the command and then delete any of the containers in the list, as follows:
We can then list all the images on our system using:
We can then delete an image and all associated filesystem layers by running:
There are times, especially during development cycles, when it makes sense to completely clean off all the images or containers from your system. There is no built-in command for doing this, but with a little creativity it can be accomplished reasonably easily.
To delete all of the containers on your Docker hosts, you can use the following command:
And to delete all the images on your Docker host, this command will get the job done
Newer versions of the docker ps and docker images commands both support a filter argument that can make it easy to fine-tune your delete commands for certain circumstances.
To remove all containers that exited with a nonzero state, you can use this filter:
And to remove all untagged images, you can type: