Building a Kubernetes Mini Cluster with Raspberry Pis [blog]

Assembly and configuration of a triple-node, fault-tolerant Raspberry Pi K3s mini cluster. [2022-09-25]

I spent the past week putting together this Raspberry Pi cluster.

cluster

I’m pretty happy with the result. It’s a “plug and play” home server rack weighing about one kilogram. I can transport it anywhere, hook it to power and internet, and it will auto-run my website deployments.

Ideation

I first hatched the idea after re-watching the classic 90’s anime Neon Genesis Evangelion.

cluster

The show depicts an organization which relies on a trio of supercomputers for operational decision making. The computers are named for the Biblical Magi, or the “Three Wise Men”. Of all of the things in the show, I’ve always been most intrigued by this. I’ve always wanted to have a personal computing cluster.

All anime inspiration aside, I have some actual reasons I’d like to build a home cluster:

I want to practice on-premises Kubernetes admin and not the managed abstracted kind you get from a cloud provider.
I’d like to put a plug on my personal AWS charges and I want to own my own platform from the infrastructure up, reducing the cost of my hobby dev work.
I just want to do something DIY. I think constant cloud usage for the past few years has deteriorated some of my “classic” server admin skills. I’m in DevOps, sure, but cloud has always felt kind of boring to me.

Terminology

In the context of this project, I’m trying to build a cluster that will serve as a personal home server to host things on the internet. A cluster is a group of computer nodes connected and programmed to work together. Ideally, if one node fails, the others can continue work on the task.

The applications and websites I want to host are built into containers. Containers allow us to write software and run it on other systems regardless of if the other systems have the dependencies to run it. If you wrote some code in Python, you could build the code into a container, and a computer that does not have Python installed would still be able to execute it.

Managing a single container on a server by hand might be easy, but complexity and management overhead increase fast the more you add. It’s also just not feasible to watch the container’s status 24/7, so we need some form of automation to handle this for us. Kubernetes is one such software which will manage the deployment lifecycle of containers for us. We can form a new cluster by installing a cluster software to each system.

Designing

Single-Node Failure Tolerance

I want my cluster to have single-node failure tolerance, where if any one node went down, the cluster as a whole would continue operating. It’s preferable to keep the cluster up in some slightly degraded state rather than completely shut down.

Control Plane and Workers

A control plane (or master) Kubernetes node is one which makes ultimate decisions for the cluster. In contrast, Kubernetes nodes that do not make decisions and only run workloads are called workers.

A node designated as a control plane has components installed to it which allow it to control what a worker node does. It records the state of the cluster in a database called etcd.

However, if there was only one control plane node in a cluster and it became unavailable, the rest of the cluster would become inoperable. Since I want to have single-node failure tolerance, I need a cluster with more than one master. What’s the ideal number?

Leader Election

When multiple control plane nodes are present, they select one amongst themselves to actually become a leader. This is necessary to prevent problems with multiple masters all making “ultimate” decisions.

When the leader goes down in a multi-master configuration, one of the other control plane nodes should notice and take over managing the cluster. The ideal number of master nodes is odd in number because a cluster will only fail if less than a majority of nodes are available. You need a majority of nodes remaining to maintain a quorum for decision making. No resilience is gained at even numbers.

To illustrate this:

Control Plane Nodes	Majority	Max. Failure Tolerance
3	2	1
4	3	1
5	3	2

Since I want single-node failure tolerance, I know I have to run a minimum of three control plane nodes.

Simultaneous Control Plane and Worker

Designing for single-node failure tolerance also means I need to run at least two worker nodes, in case one worker fails. To run a setup of three dedicated control plane nodes and two workers, this would require a total of five nodes.

To reduce the cost of this project I’m going to cut this number down by attempting to have each node run both master and worker processes. This means container workloads are potentially going to be assigned to the same nodes where control plane processes are running. I can keep the minimum number of master nodes to maintain failure tolerance while also forcing each node to run workloads.

Shared Virtual IP

We’ve covered node failure tolerance and decided on having three nodes with simultaneous master/worker duties. The next thing I want my cluster to have is a shared virtual IP.

I’m eventually going to need to send commands to the cluster’s API server, a control plane component which receives cluster administration commands.

When I connect the individual nodes to the same network, they’re each going to be assigned a unique internal IP address by my router, which poses a potential problem: what address should I be sending the API commands to?

If the nodes all have separate addresses, I would have to directly send it individually (represented below as 10.0.0.9, 10.0.0.10, or 10.0.0.11). If that one node were to become unavailable, I would need to switch sending to another node’s address.

virtual-ip

There are probably multiple ways to achieve this, but I’ll take care of this with keepalived, a Linux package which will allow the nodes in my cluster to broadcast to the network that they share one IP address.

In this setup, one node will always be an “active” receiver. If that node goes down, one of the others will automatically become the new active.

This doesn’t change any nodes’ existing IPs. It only adds an additional one to all nodes so the original internal IP can still be targeted if necessary.

In the diagram, I’m using the arbitrary virtual IP 10.0.0.100 to represent the virtual IP. As a Kubernetes master is available on all nodes, I can run administration commands by targeting the virtual IP and it should reach the cluster even if one of the nodes is down.

Build and Install

Let’s get started putting everything together.

Hardware Acquisition

supplies

I went ahead and purchased the following:

3x Raspberry Pi 4 Model B (8 GB memory, 1.5 GHz 64-bit arm).
3x 32 GB micro SD cards.
3x Power over ethernet hardware attached on top (HATs) expansion boards which allow the Pis to be powered by an Ethernet connection. These allow me to forgo the Pi’s standard power cables, free up space on my power strip, and mount the Pis in reverse-direction on the case so I can use shorter Ethernet cables.
1x Netgear 5-port gigabit PoE switch.
1x plastic shell case. The case came equipped with optional fans that can be powered through the GPIO pins of each node (my particular HAT left some pins exposed to be able to (not all types will, though)).

🔊 If you were wondering, the fans from the HATs can case running at a normal load is not loud. It’s nothing close to a typical rack server. It creates a little more noise than my idle desktop computer.

Operating System

Back on my laptop, I’m preparing the operating system installation.

I first obtained a Debian Linux release compatible with the Pis’ processors. I imaged the OS onto the microSD cards through using the official Raspberry Pi Imager.

Here is a useful convenience trick when installing operating systems to Raspberry Pis: Once the images are written to the micro SD, edit the sysconf text file in the root of the card. You can append a new hostname and root_authorized_key so when the node initially boots up, it will already have a name and will be immediately SSH-accessible from your device.

$ tail -n 2 sysconf
hostname=melchior
root_authorized_key=ssh-rsa AAAAAAAAAAA11111111122222222= ken@LASTORDER

Ansible Software Installation

I now move on to installation of systems software. This includes system utilities and the obvious required tools like Kubernetes, Docker Container Engine, and keepalived.

Since this is a repetitive task, I scripted the installation process using Ansible. If you’re unfamiliar with Ansible, it’s basically a scripting tool for doing various server administration tasks.

All of the Ansible code I used to configure the cluster is available on GitHub.

Base Layer Configuration

I structured the first Ansible playbook (a list of things to execute), common-config, to be called first to configure whatever I expect to be exactly the same on all nodes. This playbook will:

Update all apt managed packages.
Install python3. Without Python, Ansible is a glorified shell-script runner.
Install docker-ce (the engine that allows containers to run).
Install other system utilities: git, tmux, vim, netcat, and keepalived .
Configure keepalived so that all of the nodes will use the same shared IP.

K3s Installation

For Kubernetes software, I’ve opted to use K3s, a distribution that can be installed with relative ease and has releases for the arm-architecture of the Raspberry Pis.

The second Ansible playbook to run, k3s-install, initializes the cluster using two different roles (a set of specific tasks that can be embedded into a playbook):

First, the k3s-init role only runs on only one of the three nodes. This role opens the ports required by K3s, then installs K3s, and initializes a cluster in a custom method which will make it available from the shared virtual IP address. After the cluster is finished initializing, it will output a secret token to be used by other nodes to join the cluster. The token is saved by Ansible.
Next, the k3s-join role runs on the remaining two nodes. This role also opens the ports required by K3s, installs K3s, and uses the secret token from k3s-init role to join the nodes to the cluster.
Finally, a task to install a Kubernetes “ingress controller” is executed. An ingress controller is a cluster add-on which will allow us to configure “Ingress”, a Kubernetes feature which allows external traffic to access running services.

Configuration Files

Before I can start managing the cluster, I need to copy any node’s /etc/rancher/k3s/k3s.yaml K3s config to my local laptop’s ~/.kube/config.

Once copied to my local machine, I just need to slightly modify this configuration so the shared keepalived virtual IP is set in the configuration file’s server. This will allow my local machine to freely use the Kubernetes control command-line tool, kubectl, and it will be sent to any of the cluster’s control plane nodes, even if one node is down.

Public Network Access

I’m expecting to serve public web traffic over HTTPS. Earlier, I installed a Kubernetes “ingress controller” using Ansible to allow my cluster to forward external traffic to services.

As long as relevant external requests make it to the cluster, Kubernetes can pick it up from there and route the traffic to the container’s service accordingly. The last thing to do is to configure my router to forward inbound https (port 443) traffic to the shared virtual IP address.

Testing

Pod Replacement

I first want to test that when a node is disabled, pods will be replaced on different nodes. This is a basic test of automated orchestration.

I’ve deployed a basic web server container called kenellorando and configured it so there is at least one replica running on each node.

ken@LASTORDER:~$ k get nodes
NAME        STATUS      ROLES                       AGE VERSION
balthazar   Ready       control-plane,etcd,master   1d  v1.22.7+k3s1
caspar      Ready       control-plane,etcd,master   1d  v1.22.7+k3s1
melchior    Ready       control-plane,etcd,master   1d  v1.22.7+k3s1

ken@LASTORDER:~$ k get pods -n kenellorando -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase
NAME                            NODE        STATUS
kenellorando-6cbfb9f6f8-9dwzc   melchior    Running
kenellorando-6cbfb9f6f8-fbggv   balthazar   Running
kenellorando-6cbfb9f6f8-kvf44   caspar      Running

Now, I will simulate sudden failure of one of the nodes (by disconnecting its power) with the expectation that Kubernetes will notice and relaunch the pod onto one of the working systems. I disconnect balthazar and observe that melchior has launched a new one in response.

ken@LASTORDER:~$ k get nodes
NAME        STATUS      ROLES                       AGE VERSION
balthazar   NotReady    control-plane,etcd,master   1d  v1.22.7+k3s1
caspar      Ready       control-plane,etcd,master   1d  v1.22.7+k3s1
melchior    Ready       control-plane,etcd,master   1d  v1.22.7+k3s1

ken@LASTORDER:~$ k get pods -n kenellorando -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase
NAME                            NODE        STATUS
kenellorando-6cbfb9f6f8-268d5   melchior    Running
kenellorando-6cbfb9f6f8-9dwzc   melchior    Running
kenellorando-6cbfb9f6f8-kvf44   caspar      Running

Leader & Virtual IP Replacement

I next want to test that the cluster will continue operating when the leader goes down.

Recall that all of the nodes are masters, but only one among them is an elected leader. My expectation is once the leader disappears, the others will notice and one of them will take over managing the cluster.

In full K8s, here’s a “proper” way to identify which node is currently the leader (check the Annotations of kubectl describe endpoints -n kube-system). However, this annotation does not appear to be available in my K3s installation.

I will work around that by checking that some leader has assumed control simply by checking that everything still works in each possible one-node down state. This will simultaneously verify that virtual shared-IP failover works too, because we’ll be testing that kubectl works on the cluster virtual IP (my kubectl installation is targeting the virtual IP 10.0.0.100 for all tests).

The last test already demonstrated that the cluster works with balthazar down. Now I’ll just repeat the disconnect on the other nodes:

ken@LASTORDER:~$ k get nodes
NAME        STATUS      ROLES                       AGE VERSION
balthazar   Ready       control-plane,etcd,master   1d  v1.22.7+k3s1
caspar      Ready       control-plane,etcd,master   1d  v1.22.7+k3s1
melchior    NotReady    control-plane,etcd,master   1d  v1.22.7+k3s1

ken@LASTORDER:~$ k get pods -n kenellorando -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase
NAME                            NODE        STATUS
kenellorando-6cbfb9f6f8-47k2z   balthazar   Running
kenellorando-6cbfb9f6f8-fgzns   balthazar   Running
kenellorando-6cbfb9f6f8-kvf44   caspar      Running

…

ken@LASTORDER:~$ k get nodes
NAME        STATUS      ROLES                       AGE VERSION
balthazar   Ready       control-plane,etcd,master   1d  v1.22.7+k3s1
caspar      NotReady    control-plane,etcd,master   1d  v1.22.7+k3s1
melchior    Ready       control-plane,etcd,master   1d  v1.22.7+k3s1

ken@LASTORDER:~$ k get pods -n kenellorando -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase
NAME                            NODE        STATUS
kenellorando-6cbfb9f6f8-268d5   melchior    Running
kenellorando-6cbfb9f6f8-47k2z   balthazar   Running
kenellorando-6cbfb9f6f8-fgzns   balthazar   Running

This demonstrates that the cluster continues to operate with any one-node down, implying that leader assumption must be working and that keepalived is automatically reassigning “active” nodes.

So everything works.

Cost Breakdown

Item	USD (inc. sales tax)
3x Vilros Raspberry Pi 4 8GB	$478.26
C4Labs Cloudlet Case	$102.64
iMBAPrice 6in Cat6 Patch Cables	$15.28
NETGEAR 5port PoE GBit Switch	$54.59
3x waveshare PoE HAT for RPi 4B	$77.61

The system build total was $728.38.

I originally made these purchases when the electronic component supply chain was only starting to become busted and Raspberry Pis were still somewhat available.

Raspberry Pis are basically unobtainable today (September 2022). A single kit similar to the ones I purchased will likely cost over $300 now.

Assuming each node runs 24/7 at 100% load, while also powering peripheral case fans, the three nodes use 13.31 kWh in one month (Raspberry Pi Power Calculator). At the average US electricity price of $0.139/kWh this is $1.85/month.

Cloud Comparison

Suppose I went to AWS and launched a less powerful single-node t3.medium ($30.37/mo) EKS cluster ($73.00/mo), with a single classic load balancer ($18.25/mo) and a single NAT gateway ($32.85/mo). To be even more biased for AWS, we’ll assume no charges for any data transfer or instance storage.

This still amounts to $154.47 per month. Even under this scenario, my cluster pays for itself every 4.8 months. Not bad, I think.

Estimate by AWS: https://calculator.aws/#/estimate

Wrapping Up

I truly had a lot of fun doing this. Knowing the apps I’m building are also running on an infrastructure of my own design and build gives me a sense of satisfaction. I’m already using the cluster to run things like https://cadenceradio.com. I’m excited to deep dive into more Kubernetes tech too.

Thanks for reading!