Going back to basics, there are two major parts of an operating system – the kernel and the user space. The kernel is a special program executed directly on the hardware or virtual machine – it controls access to resources and schedules process. The other major part is the user space – this is the set of files, including libraries, interpreters, and programs that you see when you log into a server and list the contents of a directory such as /usr or /lib.
Linux containers essentially break the two pieces of an operating system up even further allowing the two pieces to be managed independently – the container host and the container image. The container host is made up of an operating system kernel and a small user space that has a minimal set of libraries and daemons necessary to run containers. The container image is made up of the libraries, interpreters, and configuration files of an operating system user space, as well as the developer’s application code.
A container Base Image is essentially the userspace of an operating system packaged up and shipped around, typically as an OCI/Docker image. The same work that goes into the development and release of a user space is necessary for a good base image.
This article will compare and contrast the strategies taken by different Linux distributions which are commonly used in container images.
What Should You Know When Selecting a Container Image
As stated above, a container base image is essentially the bundled contents of an operating system user space. Think of it as the libraries such as glibc, libseccomp, zlib or libsasl – also think of it as the language runtimes and interpreters such as PHP, Ruby, Python, Bash, and even Node.js. Even if your application relies strictly on an interpreted language, it probably still has dependancies on operating system libraries – this surprises many people. Here’s why – Interpreted languages commonly rely on external implementations for things that are difficult to write or maintain like openssl (crypto in PHP) , libxml2 (lxml in Python), or database extensions (mysql in Ruby). Even the JVM is written in C, which means it is compiled and linked against external libraries like glibc.
This means that, just like a regular operating system, there are three major things you should think about when selecting a container base image: architecture, security, and performance of the user space embedded within it. The design and architecture of the base image (as well as your software supply chain) will have a profound effect on reliability, supportability, and ease of use over time.
Comparison of Images
Let’s explore some common base images and try to understand the benefits and drawbacks of each.
|Image Type||Red Hat Enterprise Linux 7 Standard Image||Red Hat Enterprise Linux Atomic Image||Fedora||Fedora Modular||CentOS||Debian Stable||Ubuntu LTS||Alpine|
|C Library||glibc||glibc||glibc||glibc||glibc||glibc||glibc||musl c|
|Core Utilities||GNU Core Utils||GNU Core Utils||GNU Core Utils||GNU Core Utils||GNU Core Utils||GNU Core Utils||GNU Core Utils||Busybox|
|Size Across Wire||73MB||30MB||75MB||33MB||72MB||45MB||47MB||2MB|
|Size on Disk||200MB||78MB||230MB||85MB||192MB||100MB
|Life Cycle||10 years||6 months||6 months||6 months||variable||2 years||5 years||unknown|
|Compatibility Guarantees||Based on Tier||Generally within minor version||Generally, within a major version||Generally, within minor version||Follows RHEL||Generally within minor version||Generally within minor version||Unknown|
|Troubleshooting Tools||Integrated with Technical Support||Integrated with Technical Support||Tools Container||Tools Container||Tools Container||Standard Packages||Standard Packages||Standard Packages|
|Technical Support||Commercial & Community||Commercial & Community||Community||Community||Community||Community||Commercial & Community||Community|
|ISV Support||Large Commercial l||Large
|Large Community||Community||Community||Community||Large Community||Small Community|
|Tracking||OVAL Data, CVE Database, Vulnerability API & Errata, Lab Tools||OVAL Data, CVE Database, Vulnerability API & Errata, Lab Tools||Errata||Errata||Announce List, Errata||OVAL Data, CVE Database, & Errata||OVAL Data, CVE Database, & Errata||Limited Database|
|Proactive Security Response Team||Commercial & Community||Commercial & Community||Community||Community||None||Community||Commercial & Community||None|
|Automated Testing||Commercial||Commercial||None||None||None||None Found||None Found||None Found|
|Proactive Performance Engineering Team||Commercial||Commercial||Community||Community||Community||Community||Community||Community|
When evaluating a container image, or any Linux distribution in general, it’s important to take some basic things into consideration. Which C library, package format and core utilities are used, may be more important than you think. Most distributions use the same tools, but Alpine Linux has special versions of all of these, for the express purpose of a making a small distribution. But, small is not all that matters.
Changing core libraries and utilities can have a profound effect on what software will compile and run. It can also affect performance, security and cause discreet failure with the large and complex software stacks that are common today. Distributions have tried moving to smaller C libraries, and eventually moved back to glibc. The Debian Project and Elastic are two examples. Glibc just works, and it works everywhere, and it has had a profound amount of testing and usage over the years. It’s a similar story with GCC – tons of testing and automation.
Or stated simply here:
I run into more problems than I can count using alpine. (1) once you add a few packages it gets to 200mb like everyone else. (2) some tools require rebuilding or compiling projects. Resource costs are the same or higher. (3) no glibc. It uclibc. (4) my biggest concern is security. I don’t know these guys or the package maintainers.
We’ve seen countless other issues surface in our CI environment, and while most are solvable with a hack here and there, we wonder how much benefit there is with the hacks. In an already complex distributed system, libc complications can be devastating, and most people would likely pay more megabytes for a working system.
Moving into the enterprise space, it’s also important to think about things like Support Policies, Lifecycle, ABI/API Commitment, and ISV Certifications. While Red Hat leads in this space, all of the distributions work towards some level of commitment for each of these things. Many distributions backport patches, etc. In a container image, these are all critically important.
First, because your containers will end up running for a long time in production – just like VMs did. This is where compatibility guarantees come into play – many distributions target compatibility within a minor version, but can and do roll versions of important pieces of software. This can make an automated build work one day, and break the next. You need to be able to rebuild container images on demand, all day, every day.
Second, because the ecosystem of software that forms will have a profound effect on your own ability to deploy applications in containers. FInally, remember that the compatibility of your C library and your kernel matters, even in containers. For example, if the C library in your container image moves fast and is not in sync with the container host’s kernel, things will break.
Every distribution provides some form of package updates for some length of time. This is the bare minimum to really be considered a Linux distribution, and this affects your base image choice, because you need updates. But, there is definitely a good, better, best way to evaluate this.
- Good: the distribution in the container image produces updates for a significant lifecycle, say 5 years or more. Enough time for you and your team to get return on investment (ROI).
- Better: the distribution in the container image provides machine readable data which can be consumed by security automation. The distribution provides dashboards and data bases with security information and explanations of vulnerabilities.
- Best: the distribution in the container image has a security response team which proactively investigates and analyzes upstream code, and proactively patches security problems (CVEs don’t find & fix themselves).
Again, this is a place where Red Hat leads all of the above approaches. The Red Hat Product Security team investigated more than 2500 vulnerabilities, which led to 1,346 fixes in 2016. Also, they produce an immense amount of security data which can be consumed within security automation tools to make sure your container images are in good shape. Red Hat also provides the Container Health Index within the Red Hat Container Catalog to help users quickly analyze container base images.
All software has performance bottlenecks as well as performance regressions in new versions as they are released. The thing to think about with a container image is – how does the Linux distribution inside my container image test things? Again, let’s take a good, better, best approach.
- Good: Use a bug tracker and collect problems as contributors and users report them. Fix them as they are tracked. Almost all Linux distributions do this.
- Better: Use the bug tracker and proactively build tests so that these bugs don’t creep back into the Linux distribution, and hence back into the container images. With the release of each new major or minor version, do acceptance testing around performance with regard to common use cases like Web Servers, Mail Servers, JVMs, etc.
- Best: Have a team of engineers proactively build out complete test cases, and publish the results. Feed all of the lessons learned back into the Linux distribution.
Once again, this is a place where Red Hat leads in all of the above approaches. The Red Hat performance team proactively tests Kubernetes and Linux, for example, here & here, and then feed all of the lessons learned back upstream. Here’s what they are working on now.
When you are choosing the container image that is right for your application, please think through more than size. Even if size is a major concern, don’t just compare base images. Compare the base image along with all of the software that you will put on top of it to deliver your application. With diverse workloads, this can lead to a final size that is not so different between distributions. Also, remember that supply hygiene can have a profound effect in an environment at scale.
Think through the good, better, best with regard to architecture, security and performance of the content that is inside of the Linux base image. The decisions are quite similar to choosing a Linux distribution for the container host because what’s inside the container image is a Linux distribution.
Even when looking at something like distroless containers, think about what a distribution really is – it’s a set of content and updates to that content over some amount of time (lifecycle) – libraries, utilities, language runtimes, etc, plus the metadata which describes the dependencies between them. Yum, DNF, and APT already do this for you and the people that contribute to these projects are quite good at doing it. Imagine rolling all of your own libraries – now, imagine a CVE comes out for libblah-3.2 which you have embedded in 422 different applications. Yeah, just use a Linux distribution, they already know how to handle that problem for you.
In full transparency, I help lead our container initiatives at Red Hat, so take my recommendations with the full understanding that I am most educated on Red Hat’s approach and I am hence biased. Also, note that the opinions in this blog are mine and not those of my employer. When you work for a company, you don’t check your brain at the door. Given these facts – I did try to give each distribution a fair shake. If you have any criticisms or corrections, I will be happy to update this blog.
Finally, I want to part with some comments on a fairly old announcement that Docker was going to move all “official” images to Alpine Linux. Historically, and even today, most official docker images are built off of different versions of Debian base images. But, what does any of this mean to an end user?
I have no special insight into the projects and decision that are made at Docker, but from feedback I have received, and analysis I have done myself, it appears that this move has not been completed as of 08/18/2017 at the time of this publishing. It appears that generally, the end user has to select the alpine version of any given official image by choosing a specific tag (example: php & node.js), while others don’t have an alpine option in their latest version (example java). I want that to be clear – Docker Inc. defines what this official designation means here and generally controls the design and build of all “official” images. There is a wide range in quality of the archit ectures of the builds, as well as documentation – obviously, they cannot possibly be experts on every piece of software within these container images. While I think there is some merit in their goals around patching, best practices, and documentation the real value to an end user is debatable. While, some of the official images appear to be collaborations with upstream projects or ISVs, many are just simple images built by Docker Inc. to try to seed their ecosystem.
When Docker Inc. blind side the Debian community with the Hacker News announcement that they were moving all official images to Alpine Linux base images, Dustin Kirkland from Canonical wrote a response. I think he did a decent job. He mentioned that base image size really isn’t the most important thing to evaluate in a production environment – especially when you are caching them anyway. He even mentions the ecosystem and how important the number of packages are in a Linux distribution, but I also think this blog entry goes further to explain all the things that need to be thought about when choosing a base image.
At the end of the day, many end users still choose to build their own images anyway. Partially, because of trust, and partially because end users have such a wide range of needs that it’s nearly impossible for any one company to build for every permutation of user needs.
I certainly think Red Hat leads in all of the categories that I have set forth in this article – but, I also think that Docker’s stated goal of moving from Debian to Alpine was short sighted and naive. I think it shows a lack of technical acumen in evaluating a Linux distribution and its capabilities – especially with regard to the resources that it takes to truly build and maintain a distro over time. I also think it shows a lack of business acumen as they probably could have partnered with a commercial vendor. Docker’s ability to maintain a Linux distribution is clearly stressed – look at the number of packages maintained by each contributor. So, I leave you with this final thought: Debian (or even Ubuntu) is better suited than Alpine in almost all of these categories, but I think RHEL is still the best 🙂 #justsayin
Update: I have made some changes around my description about how Official DockerHub images are built and what that means for an end user. Thank you Phil Estes, for solid feedback.