This chapter covers:
If you are like a lot of people, you might have already heard some things about Docker, but are not sure it is right for you or your organization. Maybe you get the impression that this is a fad technology. Before you can make that claim, I’d recommend that you try it. You’re likely to be as surprised as I was.
At the moment, Docker only works with Linux software but you can use Docker and run all of the examples in this book on Linux, OSX, and Windows thanks to a utility called Boot2Docker.
Suppose you like to try out new Linux software but are worried about running something malicious. Running that software with Docker is a great first step in protecting your computer because Docker helps even the most basic software users take advantage of powerful security tools.
If you are a system administrator, making Docker the cornerstone of your software management toolset will save you time and let you focus on high value activities because Docker minimizes the time that you will spend doing mundane tasks.
If you write software, distributing your software with Docker will make it easier for your users to install and run. Writing your software in a Docker wrapped development environment will save you time configuring or sharing that environment, because from the perspective of your software every environment is the same.
Suppose you own or manage large-scale systems or data centers. Creating build, test, and deployment pipelines is simplified using Docker because moving any software through such a pipeline is identical to moving any other software through.
Part 1 of this book will cover the basic mechanics of working with software through Docker, common use-cases, and how Docker works. Part 2 will cover how to build or package software to work well with Docker. Finally, Part 3 will cover Docker as a foundation of more advanced use-cases like built-test-deploy software pipelines, managing software in clustered computing environments, and versioned development environments.
Docker works with your operating system to package, ship, and run software. You can think of Docker like a software logistics provider. It is currently available for Linux-based operating systems but that is changing fast. Either software authors or users can apply it with network applications like web servers, databases, and mail servers; terminal applications like text editors, compilers, network analysis tools, and scripts; and in some cases it is even used to run GUI applications like web browsers and productivity software. Docker will have new uses as operating systems grow to offer new features. Having help with software logistics is more important than ever because we depend on more software than ever. Docker is not a programming language, and it is not a framework for building software. Docker is a tool that helps solve common problems installing, removing, upgrading, distributing, trusting, and managing software.
Docker is open source, which means that anyone can contribute to it and it has benefited from a variety of perspectives. It is common for companies to sponsor the development of open source projects. In this case, Docker Inc is the primary sponsor. You can find out more about Docker Inc. at https://docker.com/company/.
Every week I read a few stories about difficulties installing, upgrading, removing, distributing, trusting and managing software. Some can be particularly horrific describing wasted time, frustrations, and service outages. I have personal experiences where I tried to install software for up to eight hours before giving up and finding alternative solutions.
Software installation experiences usually fall into one of two categories. Either an installation program hides everything that it is doing to install a program on your computer, or the software comes with complicated instructions. In either case, installing software will require several changes to your computer. Worst-case scenarios happen when two programs cannot run on the same computer forcing a user to make tradeoffs.
Upgrading installed software introduces an opportunity for the same incompatibilities you encounter during installation. Some tools exist for resolving those conflicts, but they are often domain specific.
Assumptions that software authors have to make about where users will install their work make software distribution just as challenging. While software authors want their work to reach the broadest possible audience, real world considerations like time and cost limit that audience.
People and institutions that suffer most from software problems deploy software to several computers. As the scale of the deployment increases so do the general complexity of the software problems. Every piece of software and every computer introduced multiply that complexity.
Trust issues are the most difficult problems to solve. Even if you trust the source of your software, how can you trust it not to break under attack? Building secure computing environments is challenging and out of reach for most users.
Using software is complex. Before installation you have to consider what operating system you're using, the resources the software requires, what other software is already installed, and what other software it depends on. You need to decide where it should be installed. Then you need to know how to install it. It’s surprising how drastically installation processes vary even today. The list of considerations is long and unforgiving. Installing software is, at best, inconsistent and over complicated.
Most computers have more than one application installed and running. And most applications have dependencies on other software. What happens when two or more applications you want to use do not play well together? Disaster. Things are only made more complicated when two or more applications share dependencies:
The simple truth is that the more software you use, the more difficult it is to manage. Even if you manage to spend the time and energy to figure out installing and running applications, how confident can anyone be about their security? Open and closed source programs release security updates continually and just being aware of all of the issues is often unmanageable. The more software you run, the greater the risk that it is vulnerable to attack.
All of these issues can be solved with careful accounting, management of resources, and logistics. Those are mundane and unpleasant things to deal with. Your time would be better spent using the software that you are trying to install, or upgrade, or publish. The people that build Docker recognized that, and thanks to their hard work you can breeze through the solutions with minimal effort in almost no time at all. I’m going to cover how containers and Docker solve these issues over the course of Part 1.
It is possible that most of these issues seem acceptable today. Maybe they even feel trivial because you're used to them. After reading how Docker makes these issues approachable, you may notice a shift in your opinion.
Without Docker, a computer can end up looking like a junk drawer. Applications have all sorts of dependencies. Some applications depend on specific system libraries for common things like sound, networking, graphics, etc. Others depend on standard libraries for the language they are written in. Some depend on other applications like a Java program depends on the Java Virtual Machine, or a web application might depend on a database. It is common that a running program requires exclusive access to some scarce resource like a network connection, or a file. Today, without Docker, applications are spread all over the place and end up creating a messy web of interactions. In Figure 1.2 you can visualize how example applications depend on example libraries without Docker.
The first thing Docker does to handle software logistics is get organized with containers. You can think of a Docker container like a shipping container. It is a box where you keep an application and all of its dependencies. Just as cranes, trucks, trains, and ships can easily work with shipping containers, so can Docker copy, pickup and move around containers with ease.
Figure 1.3 illustrates these same applications and their dependencies running inside of containers. With the links broken, and each application neatly contained, understanding the system is an approachable task.
Without Docker, businesses typically use hardware virtualization () to organize the mess. Virtual machines provide virtual hardware on which an operating system and other programs can be installed. They take a long time () to create and require significant resource overhead because they run a whole copy of an operating system in addition to the software you want to use. While this is an acceptable solution for some, virtual machines are very heavy weight and less commonly used by consumers.
Unlike virtual machines, the virtualization layer that Docker uses is minimal. Docker manages programs started using special operating system features that restrict that program’s access to resources. Because there is no additional layer between the program running inside of the container and the computer’s operating system, no resources are wasted by running redundant software. This is an important distinction. Docker is not a virtualization technology. Instead, it helps you use the container technology already built into your operating system. I’ll expand on this in section 1.6.
While Docker solves these issues, it is not a wholly competitive technology. Later I’ll describe how Docker and virtual machine technologies are complementary.
Docker provides what is called an “abstraction.” Abstractions allow you to work with complicated things in simplified terms. So, in the case of Docker, instead of focusing on all of the complexities and specifics associated with installing an application, all we need consider is what software we’d like to install. Like a crane loading a shipping container onto a ship, the process of installing any software with Docker is identical to any other. The shape or size of the thing inside the shipping container may vary, but the way that the crane picks up the container will always be the same. All of the tooling is reusable for any shipping container.
This is mirrored for application removal. When you want to remove software, you simply tell Docker which software to remove. No lingering artifacts will remain because they were all carefully contained and accounted for. Your computer will be as clean as it was before you installed the software. This has powerful implications that I’ll explore in Part 3 of the Book.
Another software problem is that an application’s dependencies typically include a specific operating system. Portability between operating systems is a major problem for software users. While it is possible to have compatibility between Linux software and OSX, using that same software on Windows can be more difficult. Doing so can require building whole ported versions of the software. Even that is only possible if suitable replacement dependencies exist for Windows. This represents a major effort for the maintainers of the application and is frequently skipped. The unfortunate thing for users is that there is a whole wealth of powerful software out there that is difficult or impossible to use on their system.
At present, Docker runs natively on Linux and comes with a single virtual machine for OSX and Windows environments. This convergence on Linux means that software running in Docker containers need only be written once against a consistent set of dependencies. You might have just thought to yourself, “Wait a minute. You just finished telling me that Docker is better than virtual machines.” That is correct, but I also said that they were complementary technologies. Using a virtual machine to contain a single program is wasteful. This is especially so when you are running a several virtual machines on the same computer. On OSX and Windows Docker uses a single small virtual machine to run all of the containers. By taking this approach the overhead of running a virtual machine is fixed while the number of containers can scale up.
This new portability helps users in a few different ways. First it unlocks a whole world of software that was previously inaccessible. Second, it is now feasible to run the same software, exactly the same software, on any system. That means that your desktop, your development environment, your company’s server, or your company’s cloud can all run the same programs. Running consistent environments is important. Doing so helps minimize any learning curve associated with adopting new technologies. It helps software developers better understand the systems that will be running their programs. It means fewer surprises. Third, software maintainers can focus on writing their programs for a single platform and set of dependencies. This will be a huge timesaving for them, and a great win for their customers.
Without Docker or virtual machines, portability is commonly achieved at an individual program level by basing the software on some common tool. For example, Java lets programmers write a single program that will mostly work on several operating systems because the programs rely on a program called a Java Virtual Machine (JVM). While this is an adequate approach while writing software, other people, at other companies, wrote most of the software we use. For example, if there is a popular webserver that I want to use, but it was not written in Java or another similarly portable language, I doubt that the authors would take time to rewrite it for me. In addition to this shortcoming language interpreters, and software libraries are the very things that create dependency problems. Docker improves the portability of every program regardless of the language it was written in, the operating system it was designed for, or the state of the environment where it is running.
Most of the things I’ve written about so far have been problems from the perspective of working with software, and the benefits of doing so from outside of a container. But containers also protect us from the software running inside of a container. There are all sorts of ways that a program might misbehave or present a security risk.
Any way you cut it, running software puts the security of your computer at risk. Since running software is the whole point of having a computer, it is prudent to apply the practical risk mitigations.
Historically, Unix style operating systems have used the term, “jail,” to describe a modified runtime environment for a program that prevents that program from accessing protected resources. Since 2005, after the release of Sun’s Solaris 10 and Solaris Containers, “container” has become the preferred term for such a runtime environment.
Like physical jail cells, anything inside a container can only access things that are inside as well. There are exceptions to this rule, but only when explicitly created by the user. Containers limit the scope of impact that a program can have on other running programs, the data it can access, and system resources. Figure 1.4 illustrates the difference between running software outside and inside of a container.
Using containers has been a best practice for a long time. However, manually building containers can be challenging and easy to do incorrectly. This challenge has put them out of reach for some, and misconfigured containers have lulled others into a false sense of security. So, we need a solution to this problem and Docker helps. Any software run with Docker is run inside of a container. Docker uses existing container engines to provide consistent containers built according to best practices. This puts stronger security within reach for everyone.
You might be thinking, “So what?” Well, what this means for you or your business, is that the scope of any security threat associated with running a particular application is limited to the scope of the application itself. Creating strong application containers is incredibly complicated and a critical component of any defense in depth strategy. It is far too commonly skipped or implemented in a half-hearted manner. With Docker, users get containers at a much lower cost. As Docker and its container engines improve, you get the latest and greatest jail features. And instead of keeping up with the rapidly evolving and highly technical world of building strong application jails, you can let Docker handle the bulk of that for you. This will save you a lot of time, money and bring peace of mind.
Docker is important for four reasons. First, the Docker container abstraction simplifies the way we work with software. That alone will save us all time, money, and energy. In the last two years several related tools have already been built on top of these abstractions and commoditize workflows that were previously only available at the largest technology companies in the world. Individuals and small to medium sized businesses will be more productive than ever.
Second, there is significant push in the software community to adopt Docker. This push is so strong that companies like Amazon, Microsoft, and Google have all worked together to contribute to its development and adopt it in their own cloud offerings. Even these companies, that are typically so at odds, have come together to support an open source project instead of developing and releasing their own solutions. Let that sink in for a moment.
This is one of those occasions where the bleeding edge has developed so quickly that established technology giants have already integrated it into their production cloud offerings. This rapid adoption, by even the most cautions companies with so much to lose, should help to eliminate the concern that the technology is unproven or too risky to adopt at home or in smaller firms.
The third reason that Docker is important is that it has accomplished for the computer what app stores did for mobile devices. It has made software installation, compartmentalization, and removal very simple. But better yet, Docker does it in a cross platform and open way. Imagine if all of the major smart phones shared the same app store. That would be a pretty big deal. Its possible with this technology in place that the lines between operating systems may finally start to blur and third-party offerings will be less of a factor in choosing an operating system.
Fourth, we are finally starting to see better adoption of some of the more advanced isolation features of operating systems. This might seem minor, but there are quite a few people out there trying to make computers more secure through isolation at the operating system level. It’s been a shame that their hard work has taken so long to see mass adoption. Containers have existed for decades in one form or another. But it is great that Docker helps us take advantage of those features without all of the complexity.
Docker can be used on most computers at work and at home. Practically, how far should this be taken?
Docker can run almost anywhere, but that doesn’t mean you’ll want to do so. For example, currently Docker can only run applications that can run on a Linux operating system. This means that .
So, while narrowing the conversation to software that typically runs on a Linux server or desktop, a solid case can be made for running almost any application inside of a container. This includes server applications like web servers, mail servers, databases, proxies, etc. Desktop software like web browsers, word processors, email clients, or other tools are also a great fit. Even trusted programs are as dangerous to run as a program you downloaded from the Internet if they interact with user provided data or network data. Running these in a container and as a user with reduced privileges will help protect your system from attack.
Beyond the added defense in-depth benefit, using Docker for day-to-day tasks helps keep your computer clean. Keeping a clean computer will prevent you from running into shared resource issues and ease software installation and removal. That same ease of installation, removal, and distribution simplifies management of computer fleets and could radically change the way companies think about maintenance.
The most important thing to remember is when containers are inappropriate. Containers will not help much with the security of programs that have to run with full access to the machine. At the time of this writing, doing so is possible but more complicated. Containers are not a total solution for security issues, but they can be used to prevent many types of attacks. Remember, you should not use software from untrusted sources. This is especially true if that software requires administrative privileges. That means it’s a bad idea to blindly run customer provided containers in a collocated environment. We’ll cover this further in Chapter 5.
In order to work with Docker, you will have to have a basic understanding of the terminology and background. Some of this may already feel familiar but if not, it will be covered in much greater detail later.
Lets tackle some high level terminology. Docker containers are started from images. A Docker image is a bundled snapshot of all the files that should be available to a program running inside of a container. You can create as many containers from an image as you want. When you do that, containers that were started from the same image do not share changes that they might makes to its files. Images work using special tools provided by the operating system to layer changes made to the files in the image. Every time a container is created from an image, it creates a new image layer for itself. When you distribute software using Docker, you distribute these images and the receiving computers start containers using them. Images are the “shippable” units in the Docker ecosystem. This subject is introduced in Chapter 2 and covered in depth in Chapter 6.
Docker works with two services to distribute images. These are called and . Docker is configured by default to use the public index and registries provided by Docker Inc. called Docker Hub. Docker works with indexes to search for images that have been made available through a registry. Registries hold published images and related metadata. This distribution infrastructure provides the equivalent of a shipping network in the shipping container metaphor. While Docker Hub is popular and available, there are other providers and it is possible to run your own registries. Working with public and private indexes and registries will be covered in depth in Part 2.
Almost everything that you’ll want to do with Docker can be accomplished with a single command. For more complex use-cases like managing software across whole fleets of computers an advanced user might write scripts or programs that use Docker. Several of these already exist and I’ll introduce you to a few of them later in the book.
The Docker program is aptly named, “docker.” Like other command lines tools the list of options and parameters is extensive but in this case they are rarely cryptic. Running a container looks something like, “” If the image you specify to run is not available, Docker will locate it and install it for you. Stopping a container looks like, “” Docker users human friendly names for all containers, so you won’t have to worry about typing long and confusing numbers out on the command line. In some cases Docker even provides tab-completion. In Linux, tab-completion lets you short cut typing out a whole word by typing the beginning of it and then hitting the tab key. This comes in handy if you are performing these tasks frequently. Docker provides a few other features to ease use. Figure 1.7 visualizes what Docker is doing behind the scenes when you install a Docker image.
Building images can either be done interactively using a command shell or by writing a Dockerfile (). Doing so interactively means starting a container from a base image and then using the command line to modify that image from within the container. Once all of your changes have been made, you can commit those changes to a new image. While this is powerful, it can also be potentially time consuming or cumbersome.Chapter 2 and chapter 4 will go into greater detail about the relationship between images and containers and more depth on the implementation of image.
Using a Dockerfile to build images lets Docker do the heavy lifting for you. A Dockerfile is an ordered command list that will create a new image from some other base image. Dockerfiles are a way to describe those changes to Docker rather than making the changes to the image manually. Chapter 6 will cover the use of Dockerfile in depth. Figure 1.6 illustrates both workflows for creating and publishing new images.
Docker is a very powerful tool and by the end of Part 2 of the book, we will have explored the features that you will need to solve most software problems.
Docker works with the tools that operating systems provide and optionally a few supplemental libraries. It is itself quite easy to install and has been designed with only a minimal set of dependencies. As I stated earlier, Docker is itself a running program. When you tell Docker to start a new container from a specific image, it creates a new child process that can be managed and monitored. That child process is the program that you want to run. When that child process is started Docker tells the operating system to wrap it in a container. Docker uses pluggable libraries like LXC to work with the operating systems in building these containers. Because these libraries are pluggable, the operating systems and features they provide can evolve separate from Docker itself. As operating systems and these libraries improve, Docker will benefit.
As I noted earlier, containers have existed for decades and control groups have been part of Linux since 2007. Docker does not provide the container technology. It specifically makes it simpler to use. To understand what containers look like on a system lets first establish a baseline. Figure 1.9 shows basic and simplified computer system architecture.
You’ll notice where the command line interface runs in what is called User Space memory just like other programs that run on top of the operating system. Ideally, programs running in User Space cannot modify Kernel Space memory. Broadly, the operating system is the interface between all user programs and the hardware that the computer is running on.
You can see in figure 1.10 that running Docker means running two programs in user space. The first is the Docker daemon. This is the engine that manages and monitors images and containers. If installed properly, this process should always be running. The second is the Docker command line interface or CLI. This is the docker program that users interact with. If you want to start, stop, install, etc. software, you will issue a command using the docker program.
Figure 1.10 also visualizes three running containers. Each is running as a child process of the Docker daemon, wrapped with a container, and the delegate process is running in its own memory subspace of the user space.I’ve included this detail to reinforce that programs running inside a container can only access their own memory and resources as scoped by the container.
Computers that are running an operating system other than Linux would appear slightly different. Docker is currently available for OSX and Windows through a tool named boot2docker that creates a single thin Linux virtual machine for the purpose of running Docker. The Docker CLI will still run natively on the host operating system, but communicates with the Docker daemon running inside of the virtual machine. Such a system is visualized in Figure 1.11. Installing Docker for Linux, OSX, and Windows will be covered in Chapter 2.