Background
So, this morning I had a call with some customers who are using Podman in RHEL 7.6 Beta. We got into a pretty good discussion about what a container engine does. Many people have tackled this subject before – Liz Rice has a great talk where she builds a container engine from scratch. I loved her talk and have been contemplating on how to expand on it for a long time. A few weeks ago at DevConf, I presented the most comprehensive container engine/runtime) presentation I have ever given – Your Architect Brain Needs to Understand How a Container Engine Works (recording & presentation).
But, this morning, I decided to go further and push the boundaries 🙂 I wanted to hack together a demo what a container engine does, and I figured what better way to do it than by live experimentation with customers, right? To my surprise, I got the demo to work! I think this approach builds on Liz’ approach by utilizing the actual building blocks used with Docker, Podman, Buildah, and CRI-O. These building blocks include the Open containers Initiative (OCI) image, distribution, and runtime standards, as well as the reference implementation, runc. If you want a touch and feel explanation of how containers work, this blog entry is for you.
As I have stated before, a container engine does three major things. To demonstrate this, let’s quickly walk through what we are going to do here:
- We are going to use Podman to construct a config.json file
- We are going to use Buildah to pull and image, expand it, cache it locally, and expose it as a directory (rootfs) that we can use
- We are going to use runc to fire up a container using the pre-built config.json, and expanded rootfs
OK, first, let’s create a working directory to collect our files. Warning: there will be some sed foo in this tutorial to simplify the commands 🙂
mkdir /root/container-parts
cd /root/container-parts
Step 1: Podman
Use podman to create a config.json for us. This file can be extremely complex. While runc can create a simple config.json, I want to use one built by podman because it will have all kinds of configuration options in it. Some of these configuration options will come from the container image, some from the container engine itself, and some from the command line options that we specify when firing up podman. These three sources are explained in more depth here. Fire up an ephemeral (–rm) container with podman:
Note: you might be asking yourself, why are you running this with the –privileged flag? Well, that’s because we are manipulating podman to create a config.json while stripping out some of the hard coded info which will cause runc to fail when ran outside of podman:
podman run --rm --privileged -it rhel7 bash
Now, in a second terminal, let’s steal the config.json file that podman creates and copy it into our home directory:
cp $(find /var/lib/containers/ | grep $(podman ps -q) | grep config.json) /root/projects/containers-parts/config.json
Back to the first terminal, exit the container. The container will be completely destroyed and storage will be removed because of the “–rm” flag. After this command, the container no longer exists
exit
We can test this with:
runc list
podman ps -a
Step 2: Buildah
Next, we are going to use Buildah to prepare a rootfs for us. Buildah will pull the necessary image layers, map them to the file system (aka cache them), and then mount the top layer with a copy-on-write layers so that we can treat it as if it’s a regular filesystem. This sounds complex, and it is behind the scenes, but the command is super simple:
buildah mount `buildah from rhel7`
Notice that the output of the above command is just a directory. In this directory, it looks strikingly like what you see when you run a container. It has all the same files and directories. That’s because it’s exactly what you would see when running a container with Podman or docker. This is a combination of all of the container image layers laid out on disk, with a copy-on-write layer on top for writes within the container:
ls /var/lib/containers/storage/overlay/fecdb77bd997c9df2812d661081347d35cb448a9cd8cc5b33a8e31bf044df814/merged
Step 3: runc
OK, now we are going to combine the garlic and mushrooms to make the soup. But first, we have to hack the config.json to use the rootfs that we created with Buildah instead of storage used when we fired up the original container with Podman:
sed -i s#overlay/.*/merged#overlay/$(buildah mount `buildah from rhel7` | cut -d '/' -f 7)/merged# config.json
Now, let’s run a new container with runc:
runc run container-parts
Alright, what just happened. Well, we have manually created a container with runc. From a technical perspective it’s almost identical to what we would have gotten had we created another new container with Podman or docker. Check it out:
runc list
ID PID STATUS BUNDLE CREATED OWNER
container-parts 7076 running /root/projects/containers-parts 2018-09-19T15:56:10.906280862Z root
Conclusion
Even though containers are a black box for a lot of people, there’s nothing magical here. Abstraction provides convenience, but understanding what’s going on under the covers, even generally, will help you make better architectural decisions in your environment. When we understand the building blocks (tools & container standards), we can build much more robust systems. We can use podman, buildah, and skopeo to wire together and provide a lot of flexibility in our environment.
If we wanted to extend this tutorial, it would be easy to modify the filesystem inside the runc container (or directly in the overlay filesystem returned by Buildah – maybe you caught that?), commit the changes with Buildah, and push them to Quay.io with Skopeo. Then, because of the OCI image and distribution specifications, we could pull down the new container image and run it with Podman, CRI-O or even the docker engine because they are all OCI compliant tools.
As always, if you have any further questions, feel free to post them below…