My problem, like most technologists, is that I only have a slice of my time to dedicate toward acquiring and maintaining knowledge about any given technology, product, project, tool, platform, etc. Split that with the fact that almost every CIO is preaching that we, as technologists, need to be closer to the business, and our time gets even thinner. This Hacker’s Guide is meant to get the smart, motivated individual up and running much quicker than just pouring through the docs.
The OpenShift documentation is very good and there is a wealth of information. This article will point you to the bits and pieces that you absolutely need to read and understand to get up and running. It will also fill in a few gaps for the old sysadmin curmudgeon – like me. It’s often easiest to follow the exact process that someone else took when upgrading or installing a piece of software, so I documented the path I took, with some explanations on why. Hopefully, this helps you install or upgrade to the latest version of OpenShift with a minimum number of stumbling.
Give this guide 60 minutes, and you will be a rockstar with installing OpenShift. Note: this guide is an update to the Hacker’s Guide to Installing OpenShift Container Platform 3.9.
Where to Start
If you are impatient, skip this section, but if you are going to invest the time, get a cup of coffee, and skim these links. These are the links I digested to come up with this guide:
- Release Notes – these are particularly good for OpenShift. They explain a lot about which features are generally available and how to get started with them.
- OpenShift System Requirements – the installation manual provides a great list of minimum requirements. For test setups, you can get away with WAY less resources. I have successfully installed OCP 3.11 on hosts with 1 vCPU, 2048MB RAM, and 40GB of disk
- Environmental Health Checks -The OCP documentation has a section called the Day 2 Operations Guide and it is phenomenal. In the spirit of DevOps, CI/CD, and testing, there are some extensive built in checks which verify that an installation is working.
- Installing Clusters – This is the main guide that I followed to derive these simplified instructions.
- Preparing for an Upgrade – Whether installing or upgrading, the upgrade guide gives a pretty good feel for what the OCP installer does to a cluste
Installation For the Impatient
Now, if you skimmed those like I did, you might have missed some things. Here are a few key pieces of information to internalize:
- The version of OpenShift maps to the version of Kubernetes on which it is built. OpenShift Container Platform (OCP) 3.7 is built on Kubernetes 1.7, OCP 3.8 -> Kube 1.8, and OCP 3.11 -> Kube 1.11
- MAKE SURE you use Ansible version 2.6. I didn’t pay attention and wasted the better part of a day messing around with Ansible 2.7 madness.
- MAKE SURE you use a brand new RHEL box installed from scratch. I had a box which I had previously installed OCP 3.6, 3.7, and 3.9 on and iOCP 3.11 would not install. I messed with it for hours and couldn’t figure out what I did to the box. OCP really wants to control everything on the box from scratch. Don’t use a corporate core build, use a scratch install.
- You can use CRI-O and still use /var/lib/containers as your mount point because the Ansible installer moves docker storage to /var/lib/containers/docker. You still need space for each container engine – CRI-O for running containers and docker for builds.
- Do not run a yum update on your nodes and expect it to work. There is a lot of care taken with Ansible playbooks to upgrade etcd, the kubernetes masters, and the kubernetes nodes
- As of OCP 3.9, the manual upgrade process is no longer supported
- As of OCP 3.11, CRI-O 1.11 is fully supported and considered Generally Available (GA)
- With this release there is a dedicated CRI-O guide which explains how to use CRI-O. You can install it with OpenShift or add nodes later.
- With this release there is a general trend toward using CRICTL instead of docker to troubleshoot node problems
- If you don’t know what CRICTL is, or have never heard of it, don’t worry, here’s a quick write up. And, here is a comparison to Podman.
I would break the installation into three basic steps – Preparation, Run Playbooks, and Test & Verify.
Most of the installation is handled by the OpenShift Container Platform installer which is based on Ansible. But, before the installer can take over and get OCP installed, you have to do a little preparation.
Install Red Hat Enterprise Linux (RHEL) – This is what gives you access to hardware and/or virtual machines on almost any cloud provider. Since RHEL runs almost anywhere, OpenShift can run almost anywhere.
Configure Container Engine
Get your container engine up and running. Versions up to OCP 3.5 only supported the docker daemon. With OCP 3.9 and above, you can use CRI-O in production. With 3.11 there is very little configuration necessary. Here is some background:
- OCP 3.6 began using the Kubernetes Container Runtime Interface (CRI). This meant that communication between Kubernetes and the docker daemon was through a standard interface and shim layer daemon. This was the beginning of plugable support for other container engines.
- OCP 3.7 supports the docker daemon and provided CRI-O support as tech preview.
- OCP 3.9 supports the docker daemon and made support for CRI-O generally available. Note, that the docker daemon is still needed for container image builds. Support for Buildah is coming soon.
- With the docker daemon, you have to get the package installed and storage configured. Red Hat provides the docker-storage-setup script to make this easier, but your mileage may vary on cloud servers. Many cloud instances don’t provide enough storage, nor the facility to partition in a way that is compatible with the storage script. Furthermore, not all cloud servers even use XFS (Linode uses EXT4 instead of XFS because their dynamic resizing relies on it). Full instructions on setting up the docker daemon can be found here.
- I typically disable the docker storage checks because I don’t mind using loop back and device mapper for builds. Even though it is slower for builds, it works fine in practice unless you are performing a ton of builds. Down the road I plan on relying on Buildah which will use overlay2 by default.
- CRI-O is designed for use in an automated environment. Because it leverages technologies like OverlayFS, it requires zero configuration by the end user. All you do is install the “cri-o” package and run the OCP installer. It’s configured securely by default, so only OCP can run containers. You specify a few OCP installer options, get a cup of coffee, and it’s set up for you.
Setup Red Hat Subscriptions – instructions here. Basically:
subscription-manager attach --pool=YOURPOOLID
subscription-manager repos --disable="*"
subscription-manager repos --enable=rhel-7-server-rpms --enable=rhel-7-server-extras-rpms --enable=rhel-7-server-optional-rpms --enable=rhel-7-server-supplementary-rpms --enable=rhel-7-server-rh-common-rpms --enable=rhel-7-server-ose-3.11-rpms --enable=rhel-7-fast-datapath-rpms
Install the Installer & Dependencies
Install the Ansible installer somewhere.
- This can be on a completely different machine. In fact, it’s typically on a machine separate from the cluster. I typically do all of my installs from a single machine by making copies of /root/.config/openshift. This makes it easy to manage installs/uninstalls of multiple clusters from one command and control box. Here is how I manage my /root/.config directory.
Enable the channels and install -full instructions here. Basically, this should get your installer up and running. For safety I verison lock ansible. Without this, I got burned me because I had EPEL attached:
subscription-manager repos --enable=rhel-7-server-ansible-2.6-rpms --enable=rhel-7-server-ose-3.11-rpms
yum install -y openshift-ansible-playbooks yum-plugin-versionlock wget git net-tools bind-utils yum-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct
yum versionlock ansible-2.6.8-1.el7ae.noarch
Once you have your hosts prepped, and the Ansible installer installed on a host, you have to create an Ansible inventory file for use with the Ansible Installer. The quick installer hack that I used to show, doesn’t work because the Quick Installer has been deprecated. Start with an Ansible Inventory file, you can find some examples in the Installing Cluster guide here.Here is a copy of the inventory file I used. It might help you get further along. Copy one and start hacking on it. Be sure to add a few things like:
This playbook makes sure all of the packages are installed and checks a bunch of things:
ansible-playbook -vvv -b -i /root/.config/openshift/hosts /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml
Deploy Cluster (Formerly Advanced Installer)
This playbook kicks off the installer:
ansible-playbook -vvv -b -i /root/.config/openshift/hosts /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
Every now and then, you need to delete a few things manually as well:
ansible-playbook -vvv -i /root/.config/openshift/hosts /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml
rpm -e etcd flannel
rm -rf /etc/etcd /var/lib/etcd /etc/origin /root/.kube/ /var/log/journal/*
Sometimes if the installer fails, you can just restart the master portion to troubleshoot:
Sometimes if the installer fails, you can restart the node portion to troubleshoot:
ansible-playbook -vvv -i /root/.config/openshift/hosts /usr/share/ansible/openshift-ansible/playbooks/openshift-node/config.yml
A full list of other playbooks is document here as well as the order to run them in. Often it can be useful to run certain ones over and over when troubleshooting an installation.
For example, I recently installed a seven node cluster on virtual machines hosted at Linode – since Linode doesn’t have the concept of a Virtual Private Network like AWS, I needed to lay down some kind of network layer to make communications for NFS and etcd secure. I chose OpenVPN, which at first added some complexity to my installation. But, once I troubleshooted the initial problems and got a working Ansible inventory file, the install runs flawlessly.
Test & Verify
OpenShift Container Platform has a ton of great documentation. There is a lot of documentation, so sometimes it helps to take some notes on where to find some of the choice tidbits. Here are some of my favorite test to check that an OCP cluster is up and running properly.
Day Two Operations Guide
Within the OCP documentation is a section called the Day Two Operations Guide. Within it is a ton of great information about how to run OCP in production. One of my favorite sections for verifying an installation is called Environment Health Checks. Basically, run the following commands and if they complete, your environment should be pretty healthy:
oc new-project validate
oc new-app cakephp-mysql-example
oc logs -f bc/cakephp-mysql-example
The Cluster Administration chapter has a great section on Troubleshooting Networking which documents how to use a set of automated tests built into the oc command. The following is your friend when doing strange networking things (like OpenVPN below OCP 😛 ):
oc adm diagnostics NetworkCheck
The Upgrading Clusters chapter has a section called Verifying the Upgrade. Within it is a set of simple tests to verify a cluster. These are useful even for new installations:
oc get nodes
oc get -n default dc/docker-registry -o json | grep \"image\"
oc get -n default dc/router -o json | grep \"image\"
oc adm diagnostics
OpenShift is a great platform built on the power of Kubernetes. Part of it’s value comes from the extensive power of the installer, documentation, troubleshooting guides – which in the bigger picture is what facilitates the great ecosystem.
I have installed and uninstalled OpenShift 100s of times – earning me the ability to troubleshoot almost anything in a distributed systems environment 🙂 I wanted to share some of my tips and tricks so that you can get your own environment up and running quicker. As you develop your own tips and tricks, please, please, please share them back. I welcome comments and feedback below…