Navigation Menu
A Comparison of Linux Container Images

A Comparison of Linux Container Images

By on Aug 18, 2017 in Article | 20 comments

 

Background

Going back to basics, there are two major parts of an operating system – the kernel and the user space. The kernel is a special program executed directly on the hardware or virtual machine – it controls access to resources and schedules process. The other major part is the user space – this is the set of files, including libraries, interpreters, and programs that you see when you log into a server and list the contents of a directory such as /usr or /lib.

 

Linux containers essentially break the two pieces of an operating system up even further allowing the two pieces to be managed independently – the container host and the container image. The container host is made up of an operating system kernel and a small user space that has a minimal set of libraries and daemons necessary to run containers. The container image is made up of the libraries, interpreters, and configuration files of an operating system user space, as well as the developer’s application code.

 

A container Base Image is essentially the userspace of an operating system packaged up and shipped around, typically as an OCI/Docker image.  The same work that goes into the development and release of a user space is necessary for a good base image.

This article will compare and contrast the strategies taken by different Linux distributions which are commonly used in container images.

 

What Should You Know When Selecting a Container Image

As stated above, a container base image is essentially the bundled contents of an operating system user space. Think of it as the libraries such as glibc, libseccomp, zlib or libsasl – also think of it as the language runtimes and interpreters such as PHP, Ruby, Python, Bash, and even Node.js. Even if your application relies strictly on an interpreted language, it probably still has dependancies on operating system libraries – this surprises many people. Here’s why – Interpreted languages commonly rely on external implementations for things that are difficult to write or maintain like openssl (crypto in PHP) , libxml2 (lxml in Python), or database extensions (mysql in Ruby). Even the JVM is written in C, which means it is compiled and linked against external libraries like glibc.

 

This means that, just like a regular operating system, there are three major things you should think about when selecting a container base image: architecture, security, and performance of the user space embedded within it. The design and architecture of the base image (as well as your software supply chain) will have a profound effect on reliability, supportability, and ease of use over time.

Comparison of Images

Let’s explore some common base images and try to understand the benefits and drawbacks of each.

 

Image Type Red Hat Enterprise Linux 7 Standard Image Red Hat Enterprise Linux Atomic Image Fedora Fedora Modular CentOS Debian Stable Ubuntu LTS Alpine
Architecture                
C Library glibc glibc glibc glibc glibc glibc glibc musl c
Packaging Format rpm rpm rpm rpm rpm dpkg dpkg apk
Core Utilities GNU Core Utils GNU Core Utils GNU Core Utils GNU Core Utils GNU Core Utils GNU Core Utils GNU Core Utils Busybox
Size Across Wire 73MB 30MB 75MB 33MB 72MB 45MB 47MB 2MB
Size on Disk 200MB 78MB 230MB 85MB 192MB 100MB

 

120MB 4MB
Life Cycle 10 years 6 months 6 months 6 months variable 2 years 5 years unknown
Compatibility Guarantees Based on Tier Generally within minor version Generally, within a major version Generally, within minor version Follows RHEL Generally within minor version Generally within minor version Unknown
Troubleshooting Tools Integrated with Technical Support Integrated with Technical Support Tools Container Tools Container Tools Container Standard Packages Standard Packages Standard Packages
Technical Support Commercial & Community Commercial & Community Community Community Community Community Commercial & Community Community
ISV Support Large Commercial l Large
Commercial
Large Community Community Community Community Large Community Small Community
Security                
Updates Commercial Commercial Community Community Community Community Community Community
Tracking OVAL Data, CVE Database, Vulnerability API & Errata, Lab Tools OVAL Data, CVE Database, Vulnerability API & Errata, Lab Tools Errata Errata Announce List, Errata OVAL Data, CVE Database, & Errata OVAL Data, CVE Database, & Errata Limited Database
Proactive Security     Response Team Commercial & Community Commercial & Community Community Community None Community Commercial & Community None
Performance                
Automated Testing Commercial Commercial None None None None Found None Found None Found
Proactive Performance Engineering Team Commercial Commercial Community Community Community Community Community Community

 

Architecture

When evaluating a container image, or any Linux distribution in general, it’s important to take some basic things into consideration. Which C library, package format and core utilities are used, may be more important than you think. Most distributions use the same tools, but Alpine Linux has special versions of all of these, for the express purpose of a making a small distribution. But, small is not all that matters.

Changing core libraries and utilities can have a profound effect on what software will compile and run. It can also affect performance, security and cause discreet failure with the large and complex software stacks that are common today. Distributions have tried moving to smaller C libraries, and eventually moved back to glibc. The Debian Project and Elastic are two examples. Glibc just works, and it works everywhere, and it has had a profound amount of testing and usage over the years. It’s a similar story with GCC – tons of testing and automation.

 

Or stated simply here:

I run into more problems than I can count using alpine. (1) once you add a few packages it gets to 200mb like everyone else. (2) some tools require rebuilding or compiling projects. Resource costs are the same or higher. (3) no glibc. It uclibc. (4) my biggest concern is security. I don’t know these guys or the package maintainers. 

Or here:

We’ve seen countless other issues surface in our CI environment, and while most are solvable with a hack here and there, we wonder how much benefit there is with the hacks. In an already complex distributed system, libc complications can be devastating, and most people would likely pay more megabytes for a working system.

 

Moving into the enterprise space, it’s also important to think about things like Support Policies, Lifecycle, ABI/API Commitment, and ISV Certifications. While Red Hat leads in this space, all of the distributions work towards some level of commitment for each of these things. Many distributions backport patches, etc.  In a container image, these are all critically important.

First, because your containers will end up running for a long time in production – just like VMs did. This is where compatibility guarantees come into play – many distributions target compatibility within a minor version, but can and do roll versions of important pieces of software. This can make an automated build work one day, and break the next. You need to be able to rebuild container images on demand, all day, every day.

 

Second, because the ecosystem of software that forms will have a profound effect on your own ability to deploy applications in containers. FInally, remember that the compatibility of your C library and your kernel matters, even in containers. For example, if the C library in your container image moves fast and is not in sync with the container host’s kernel, things will break.

 

Security

Every distribution provides some form of package updates for some length of time. This is the bare minimum to really be considered a Linux distribution, and this affects your base image choice, because you need updates. But, there is definitely a good, better, best way to evaluate this.

  • Good: the distribution in the container image produces updates for a significant lifecycle, say 5 years or more. Enough time for you and your team to get return on investment (ROI).
  • Better: the distribution in the container image provides machine readable data which can be consumed by security automation. The distribution provides dashboards and data bases with security information and explanations of vulnerabilities.
  • Best: the distribution in the container image has a security response team which proactively investigates and analyzes upstream code, and proactively patches security problems (CVEs don’t find & fix themselves).

 

Again, this is a place where Red Hat leads all of the above approaches. The Red Hat Product Security team investigated more than 2500 vulnerabilities, which led to 1,346 fixes in 2016. Also, they produce an immense amount of security data which can be consumed within security automation tools to make sure your container images are in good shape. Red Hat also provides the Container Health Index within the Red Hat Container Catalog to help users quickly analyze container base images.

 

Performance

All software has performance bottlenecks as well as performance regressions in new versions as they are released. The thing to think about with a container image is – how does the Linux distribution inside my container image test things? Again, let’s take a good, better, best approach.

  • Good: Use a bug tracker and collect problems as contributors and users report them. Fix them as they are tracked. Almost all Linux distributions do this.
  • Better: Use the bug tracker and proactively build tests so that these bugs don’t creep back into the Linux distribution, and hence back into the container images. With the release of each new major or minor version, do acceptance testing around performance with regard to common use cases like Web Servers, Mail Servers, JVMs, etc.
  • Best: Have a team of engineers proactively build out complete test cases, and publish the results. Feed all of the lessons learned back into the Linux distribution.

 

Once again, this is a place where Red Hat leads in all of the above approaches. The Red Hat performance team proactively tests Kubernetes and Linux, for example, here & here, and then feed all of the lessons learned back upstream. Here’s what they are working on now.

 

Recommendations

When you are choosing the container image that is right for your application, please think through more than size. Even if size is a major concern, don’t just compare base images. Compare the base image along with all of the software that you will put on top of it to deliver your application. With diverse workloads, this can lead to a final size that is not so different between distributions. Also, remember that supply hygiene can have a profound effect in an environment at scale.

 

Think through the good, better, best with regard to architecture, security and performance of the content that is inside of the Linux base image. The decisions are quite similar to choosing a Linux distribution for the container host because what’s inside the container image is a Linux distribution.

 

Even when looking at something like distroless containers, think about what a distribution really is – it’s a set of content and updates to that content over some amount of time (lifecycle) – libraries, utilities, language runtimes, etc, plus the metadata which describes the dependencies between them. Yum, DNF, and APT already do this for you and the people that contribute to these projects are quite good at doing it. Imagine rolling all of your own libraries – now, imagine a CVE comes out for libblah-3.2 which you have embedded in 422 different applications. Yeah, just use a Linux distribution, they already know how to handle that problem for you.

 

Parting Thoughts

In full transparency, I help lead our container initiatives at Red Hat, so take my recommendations with the full understanding that I am most educated on Red Hat’s approach and I am hence biased. Also, note that the opinions in this blog are mine and not those of my employer. When you work for a company, you don’t check your brain at the door. Given these facts  – I did try to give each distribution a fair shake. If you have any criticisms or corrections, I will be happy to update this blog.

 

Finally, I want to part with some comments on a fairly old announcement that Docker was going to move all “official” images to Alpine Linux. Historically, and even today, most official docker images are built off of different versions of Debian base images. But, what does any of this mean to an end user?

 

I have no special insight into the projects and decision that are made at Docker, but from feedback I have received, and analysis I have done myself, it appears that this move has not been completed as of 08/18/2017 at the time of this publishing. It appears that generally, the end user has to select the alpine version of any given official image by choosing a specific tag (example: php & node.js), while others don’t have an alpine option in their latest version (example java). I want that to be clear – Docker Inc. defines what this official designation means here and generally controls the design and build of all “official” images. There is a wide range in quality of the archit ectures of the builds, as well as documentation – obviously, they cannot possibly be experts on every piece of software within these container images. While I think there is some merit in their goals around patching, best practices, and documentation the real value to an end user is debatable.  While, some of the official images appear to be collaborations with upstream projects or ISVs, many are just simple images built by Docker Inc. to try to seed their ecosystem.

 

When Docker Inc. blind side the Debian community with the Hacker News announcement that they were moving all official images to Alpine Linux base images, Dustin Kirkland from Canonical wrote a response. I think he did a decent job. He mentioned that base image size really isn’t the most important thing to evaluate in a production environment – especially when you are caching them anyway.  He even mentions the ecosystem and how important the number of packages are in a Linux distribution, but I also think this blog entry goes further to explain all the things that need to be thought about when choosing a base image.

 

At the end of the day, many end users still choose to build their own images anyway. Partially, because of trust, and partially because end users have such a wide range of needs that it’s nearly impossible for any one company to build for every permutation of user needs.

 

I certainly think Red Hat leads in all of the categories that I have set forth in this article – but, I also think that Docker’s stated goal of moving from Debian to Alpine was short sighted and naive. I think it shows a lack of technical acumen in evaluating a Linux distribution and its capabilities – especially with regard to the resources that it takes to truly build and maintain a distro over time.  I also think it shows a lack of business acumen as they probably could have partnered with a commercial vendor. Docker’s ability to maintain a Linux distribution is clearly stressed – look at the number of packages maintained by each contributor. So, I leave you with this final thought: Debian (or even Ubuntu) is better suited than Alpine in almost all of these categories, but I think RHEL is still the best 🙂 #justsayin

 

 

Update: I have made some changes around my description about how Official DockerHub images are built and what that means for an end user. Thank you Phil Estes, for solid feedback.

 

    20 Comments

  1. That table is awesome. So much information all in one place.

    • Deb, hello. I saw your comment after enjoying the articled

    • Thanks for the feedback for both of those. My chart above is talking about automated testing of performance and security. Many/most/all distributions have automated CI testing, but I don’t see where this is performance based? e.g. tests web server through put or mail server through put, or anything functional like that?

      • Hi, thanks for your followup.

        Regarding security, the Debian tests for Apache, for instance, contain tests for specific CVEs.

        https://ci.debian.net/data/autopkgtest/unstable/amd64/a/apache2/20170825_115019/log.gz

        For example, these CVEs are tested in the log above.

        t/security/CVE-2005-3352.t ………. ok
        t/security/CVE-2005-3357.t ………. ok
        t/security/CVE-2006-5752.t ………. ok
        t/security/CVE-2007-5000.t ………. ok
        t/security/CVE-2007-6388.t ………. ok
        t/security/CVE-2008-2364.t ………. ok
        t/security/CVE-2009-1195.t ………. ok
        t/security/CVE-2009-1890.t ………. ok
        t/security/CVE-2011-3368-rewrite.t .. ok
        t/security/CVE-2011-3368.t ………. ok

        Still researching for more info regarding automated performance/throughput testing… I’ll reply back if I find anything specific.

  2. Serendipitous timing. As I’m reading this in another window I’m working en email thread requesting a new VM for testing a containerized app (vendor supplies it as a Docker app).

    It’s my first encounter with containerization, and your post is really helpful to me, although a bit confusing because I go way back in the o/s space (let’s just say I was shipping kernel code before Linus Torvalds’ first released any of his code). The classical terminology with which I’ve been familiar in the past distinguishes third-party libraries (such as glibc etc) from the core o/s package, so it is taking me a bit wrapping my head around your description of containers as including part of the o/s.

    That’s been aggravated because the initial plan for our environment was to host the Docker app supplied by our vendor on Microsoft Server 2016. RHEL was the other preferred alternative, but even the modest annual subscription for Docker EE was an issue (we’re a government agency with strong financial controls, and Server 2016 provides Docker support for no additional cost – too bad it didn’t work!). As a result we are now evaluating Ubuntu on Docker CE, to avoid the procurement red tape.

    Point is, there are also non-technical issues to be considered in some organizations. We already have Ubuntu in house, as well as RHEL and M$ Server 2016, so platform technical issues are not as daunting concern.

    FYI re Server 2016, there were issues just getting it installed and running the Docker install verify “hello world” apps. Took a few days to get past those, and then the supplied app had problems because the fs layout seems to be different in some areas. Just not worth debugging vendor issues, so we’re probably deploying the Linux variant to production.

    • Brujo,
      Thanks for you question. Glad I could remove some confusion, hopefully I can help with your questions too. I am going to try and parse it below with inline text 🙂

      > Serendipitous timing. As I’m reading this in another window I’m working en email thread requesting a new VM for testing a containerized app (vendor supplies it as a Docker app).

      Perfect timing 🙂

      > It’s my first encounter with containerization, and your post is really helpful to me, although a bit confusing because I go way back in the o/s space (let’s just say I was shipping kernel code before Linus Torvalds’ first released any of his code). The classical terminology with which I’ve been familiar in the past distinguishes third-party libraries (such as glibc etc) from the core o/s package, so it is taking me a bit wrapping my head around your description of containers as including part of the o/s.

      Your point is quite fair. I also never used to think of glibc and libraries as “part of” the OS, but nowadays most people refer to all of the “stuff” that comes along on the ISO or in the AMI or now container image, as part of the OS. Also, the OS vendors are typically responsible for performance, security and features in all of this content. So the colloquial has started to refer to all of this as part of the OS.

      > That’s been aggravated because the initial plan for our environment was to host the Docker app supplied by our vendor on Microsoft Server 2016. RHEL was the other preferred alternative, but even the modest annual subscription for Docker EE was an issue (we’re a government agency with strong financial controls, and Server 2016 provides Docker support for no additional cost – too bad it didn’t work!). As a result we are now evaluating Ubuntu on Docker CE, to avoid the procurement red tape.

      As a side note, there are “docker-current” and “docker-latest” packages which comes with RHEL. It is maintained by Red Hat. So, you are welcome to give that a try as well. Also, with the advent of OCI, Red Hat is very much researching and driving alternative runtimes like CRI-O. You can also keep track of progress on the CRI-O blog here.

      > Point is, there are also non-technical issues to be considered in some organizations. We already have Ubuntu in house, as well as RHEL and M$ Server 2016, so platform technical issues are not as daunting concern.

      Yeah, there are definitely technical issues to think about, and the rabbit hole goes very deep in the container image and host space. For a deep, deep guide, check this out: https://www.redhat.com/en/resources/container-image-host-guide-technology-detail

      > FYI re Server 2016, there were issues just getting it installed and running the Docker install verify “hello world” apps. Took a few days to get past those, and then the supplied app had problems because the fs layout seems to be different in some areas. Just not worth debugging vendor issues, so we’re probably deploying the Linux variant to production.

      That is painful to hear. I promise you that you can run “hello world” in RHEL quite easily 😉 We are doing a ton of work in containers and this is a well beaten path. Heck, I am about to publish an article on running “hello world” with CRI-O and even though that is very new software, it was quite straightforward. (entry to come on soon on the CRI-O blog mentioned)

  3. You can always tailor base images to your needs. For example, if you don’t need ‘systemd’ but almost always include ‘curl’, it’s worth exploring the distributions’ tools how to slim those images down. The less packages you ship, the smaller any attack surface becomes.

    For example, use Gentoo’s “catalyst” or Debian’s/Ubuntu’s “debootstrap”. Use their packages, perhaps taylor one or two to your needs. Do it in a automated way and you have the best of all worlds.

    I for one do this, and have slimmed down Ubuntu to 39 MB (16 on the wire). You can find the result here, including instructions how to reproduce this:
    https://github.com/Blitznote/debase

    • I don’t disagree. You can always tailor a base image, just like operations teams have architected gold builds for 20 years. It’s really not terribly different. My one slight nit pick is people are obsessed with size even though it is NOT the highest priority in a production environment. Size absolutely matters for demos at a conference with poor wifi connectivity, but in production when things are cached, DRY [1] is much more important.

      First and foremost, ALWAYS use layers. Then, anything you use more than once should be pushed into the parent layer. This will result in bigger base and intermediate images, but much more efficiency at scale with hundreds of applications relying on the same supply chain. I am working on a draft, that I can’t share just yet, but here is a list of some best practices:

      Best Practices

      Layer your application

      The number of layers should reflect the complexity of your application

      Containers are a slightly higher level of abstraction than an rpm

      Avoid solving every problem inside the container

      Use the start script layer to provide a simple extraction from the process run time

      Build clear and concise operations into the container to be controlled by outside tools

      Identify and separate code, configuration and data

      Code should live in the image layers

      Configuration, data and secrets should come from the environment

      Containers are meant to be restarted

      Don’t re-invent the wheel

      Never build off the latest tag, it prevents builds from being reproducible over time

      Use liveliness and readiness checks

      [1]: http://rhelblog.redhat.com/2016/02/24/container-tidbits-can-good-supply-chain-hygiene-mitigate-base-image-sizes/

  4. Good summary of points to think about. One nitpick: the Java image is deprecated for almost one year now so therefore doesn’t have an Alpine-based version. It is replaced by the openjdk image which does have Alpine-based images:

    https://hub.docker.com/_/openjdk/

    • Totally fair. I missed that. I will update.

  5. My favourite docker image “scratch” is missing from the list…. 0 bytes

    • I am going to write a follow on blog to really highlight this point, but generally, in a large environment, at scale, it is just best to use an existing Linux distribution for dependency management. While anybody can start with “scratch” and manage their own dependencies (aka build their own Linux distribution, because that’s a huge value of Linux distributionss. Think yum or apt), it’s not fun, nor does it typically provide return on investment. Also, starting with scratch, without discipline almost always leads to crazy image sprawl, which is also very bad at scale [1].

      Generally, developers should focus their time on more productive, more business focused tasks like building new apps, not maintaining thousands of custom built container images, each with their own chaotic dependency chain nightmare [2].

      [1]: http://rhelblog.redhat.com/2016/02/24/container-tidbits-can-good-supply-chain-hygiene-mitigate-base-image-sizes/
      [2]: https://developers.redhat.com/blog/2016/05/18/3-reasons-i-should-build-my-containerized-applications-on-rhel-and-openshift/

      • I don’t suggest building a userspace…

        I should have elaborated more. Most docker images I build are statically compiled Go binaries without dependency on libc. So my images are basically from scratch + single binary.

        • Yeah, that has always made sense to me. Go binaries, and maybe even Java images, you can get away with that. Even then, I typically put a “layer” in between to insert things like curl or standard things I want in all images for troubleshooting at scale in a distributed systems environment.

        • Go, even if it does not depend on libc, still needs some files in userspace. Building from “scratch” seems to beginners it’s sufficient – which it in most trivial cases is – but indeed it’s not:

          For example, Go needs /etc/services*, /etc/mime.types, /etc/ssl/ca-certificates*, timezone data, and so on.

          • Agree, one must be very careful using scratch, and test their applications well. I actually copy ca-certificates into my images that need it, tzdata is a good point. Go bundles tzdata into runtime in case OS does not provide, but they could get outdated. I run my containers inside Kubernetes which takes care of adding /etc/resolv.conf, hostname, etc.

            My strategy is scratch for lightweight microservices/data pipeline, alpine for CGO requirement (or requiring external executables), and debian for debugging.

  6. Great article as always. I never imagined my post about Alpine would have ever gone further than the publish button in my CMS.

    I have talked to quite some security vendors lately and I always ask what their prefered base image is. From my ad-hoc reporting, it is between Red Hat and Debian. Something interesting that was pointed out to me in these discussions is to pay attention to the amount of time required to fix critical CVE’s for the different images. Needless to say, Red Hat and Debian do a good job in this department.

    FYI, I am looking forward to your CRI-O article.

Trackbacks/Pingbacks

  1. Recent technical articles. - CertDepot - […] A Comparison of Linux Container Images, […]
  2. Links 3/9/2017: Linux 4.13 Out Shortly, Manjaro 17.0.3, ReactOS 0.4.6, Oracle Solaris Layoffs | Techrights - […] A Comparison of Linux Container Images […]
  3. Latest technical articles. - CertDepot - […] A Comparison of Linux Container Images, […]

Post a Reply

Your email address will not be published.