Create a complete cheapo Kubernetes cluster

January 16, 2020

This post is a guide to setting up a bare-metal Kubernetes cluster on Hetzner Cloud.

Why? Well, previously I had a couple of applications running on a Google Cloud virtual server. They were running on FreeBSD because my original plan had been to use it as an opportunity to learn a bit about FreeBSD. This failed because I’d forgotten that the end goal of setting up a server is to interact with it as little as possible. Once things were working I didn’t really want to dick around with it and so my main FreeBSD-specific experiences were limited to fiddling about getting things to build properly. Because I didn’t interact with the server very often I kept forgetting how to do things and had to rely on past me having thought to write copious notes. Doing per-application backups was a bit fiddly and generally I didn’t have good visibility on what was going on. Since I wanted to play with Kubernetes anyway I thought it would be a good opportunity to improve my deployments, backup strategy and general plumbing.

⚠️Warning: over-engineering alert ⚠️

Before you start spluttering over your keyboard as you furiously tweet about the state of programmers today, I am fully aware that “I need a Kubernetes cluster” does not immediately follow from “I didn’t have a good deployment pipeline”. There are many other, easier and more sensible solutions that I would have applied in the workplace. I just wanted an excuse to justify it to myself. My main aims were to refresh my Kubernetes foo, remind myself how frustrating the kubectl interface is and see how the ecosystem had totally changed in the few months since I’d last used it.

Update Jan 2021

So it turns out a pretty major bit I missed off was certificate renewal and my cluster’s certificates expired after a year. This is where my problems started.

I ran the certificate renewal kubedm command and it appeared to work but the cluster wasn’t accessible. After a bit of Googling, I found that the renewal command doesn’t always update ~/.kube/config` correctly or basically leave things in a usable state. Awkward. I updated the config to have the correct certificate data but still no luck. I tried researching further but everything I found seemed to implicitly refer to a different version of Kubernetes and things kept changing so it was hard to know what would make things better and what would bugger things up even more.

Since I had backups, I thought I could just re-run my cluster deployment script and restore from backup. Silly me! My script was utterly broken in multiple ways since I had the temerity to leave it for a whole twelve months. After a few attempts at making the necessary (and fairly arbitrary) changes, I gave up and deployed a new Digital Ocean droplet in a few minutes.

Kubernetes is obviously a complex system trying to solve a difficult problem, but the amount of incidental complexity caused by churn is insane. It’s like pre-React JavaScript in every bad way.

As thing stand, this blog post is contributing to the problem by presenting out of date information. If you’re reading this now, be aware you’ll need to research and update each step. It won’t just work.

Part of the problem is that hetzner-kube is pretty out of date and the maintainer is not accepting PRs. The core of it should work if you build from master, but the addons I tried were broken. People have documented the problems in various issues, so you could probably muddle through if you really wanted to.

Or you could just deploy a managed cluster somewhere and save yourself having to redo everything in a few months.

Cluster creation

I used hetzner-kube, a dinky little CLI that handles a lot of the basics. It also has some very useful “addons”, which are common applications with deployments tweaked to work well with Hetzner Cloud. What with Kubernetes being Kubernetes things get out of date quickly and so I did have to delve through a few old issues to make some things work. I’m hoping to update the project documentation. In the meantime this blog documents my steps.

So, assuming you’ve already signed up to Hetzner Cloud, follow the instructions in the hetzner-kube README to get an API token. Set up that and your SSH keys (I’m filling in the example with some names I used):

$ hetzner-kube context add k8s
Token: <PASTE-TOKEN-HERE>
$ hetzner-kube ssh-key add -n personal

Now we’re ready to create a cluster. But hold your horses!

The Kubernetes master node requires more than one vCPU. By default hetzner-kube deploys a CX11 VPS, which only has one vCPU. We need to specify CX21 for the master node.

For storage we’re going to be using Hetzner block storage. I read in an issue that the nodes need to be in the same datacentre to be able to mount volumes. By default hetzner-kube spreads the nodes around Hetzner Cloud datacentres but it lets you specify a particular one. I chose Nuremburg in the hope that the servers would like it there.

Finally, I want to be able to use a floating IP to access the node. This means we don’t need to worry about nodes’ IP address changing if workers get replaced. For now we just need to register the floating IP with each worker node. Create a floating IP in the Hetzner Cloud interface and create a file called cloud-init-config containing the following (slightly hacky – improvements welcome) script:

#cloud-config
package_update: true
write_files:
 -  path: /etc/network/interfaces.d/60-floating-ip.cfg
    content: |
        auto eth0:1
        iface eth0:1 inet static
          address YOUR.IP.GOES.HERE
          netmask 32
 - path: /opt/cloud-init-scripts/setup-floating-ip.sh
   content: |
       if [[ `hostname -s` =~ worker ]]; then
           echo "Setting up virtual IP..."
           systemctl restart networking.service
       else
           rm /etc/network/interfaces.d/60-floating-ip.cfg
       fi
runcmd:
  - [ bash, /opt/cloud-init-scripts/setup-floating-ip.sh ]

Obviously enough, replace YOUR.IP.GOES.HERE with your floating IP. If you haven’t come across cloud-init before, it’s just a way of specifying some commands to run on node creation. This script ensures that the workers know about the floating IP.

Putting that all together the cluster creation command is like so:

$ hetzner-kube cluster create \
             --name k8s \
             --ssh-key personal \
             -w 2 \
             --master-server-type cx21 \
             --cloud-init cloud-init-config \
             --datacenters fsn1-dc14

Tweak the worker count to your satisfaction. After a while your cluster should be up and running!

Firewall

hetzner-kube doesn’t set up a firewall, though there’s an open PR, so this is the next step. Install the hcloud CLI from Hetzner Cloud. SSH into each node (including master) using hcloud server ssh server-name and run the following, taken from the linked PR:

$ ufw --force reset && \
  ufw allow in from MASTER.IP.GOES.HERE to any && \
  ufw allow in from WORKER1.IP.GOES.HERE to any && \
  ufw allow in from WORKER2.IP.GOES.HERE to any && \
  ufw allow 6443 && \
  ufw allow ssh && \
  ufw allow in from 10.244.0.0/16 to any && \
  ufw allow 80 && \
  ufw allow 443 && \
  ufw default deny incoming && \
  ufw --force enable

Obviously adding node IPs to suit your configuration. This allows requests from each node, the control plane and tools like kubectl. I think this is secure – please let me know if not!

Basic setup

Now that we have a secured cluster in operation, it’s time to start deploying things. To start off we’ll deploy a few addons. First of all, deploy hcloud-controller-manager so that the cluster can talk to Hetzner Cloud’s APIs:

$ hetzner-kube cluster addon install -n k8s hcloud-controller-manager

Then Helm, the Kubernetes package manager:

$ hetzner-kube cluster addon install -n k8s helm

Run the following command to update your local .kubeconfig:

$ hetzner-kube cluster kubeconfig k8s -f

You should now be able to connect to your cluster using kubectl on your local machine!

Ingress

Ingress is a bit of a pain on Hetzner Cloud. We need an IngressController resource so that requests from outside the cluster can get routed to the correct resource inside the cluster. If you’re running a cluster in a hosted environment like GKE or EKS the cloud provider includes an IngressController that makes use of the cloud provider’s load balancer. Hetzner Cloud doesn’t have such load balancers so we’re on our own.

The solution depends on how much availability you’re aiming for. I’m not shooting for high availability so I’m happy with brief outages as long as everything fixes itself without intervention.

At first I looked at metallb, a bare-metal load balancer. As I understand it metallb basically replicates what a cloud-provided load balancer would do. Unfortunately metallb doesn’t support Hetzner Cloud and I couldn’t get it to work even when following these instructions from a Hetzner person.

Next I tried a standard nginx-ingress setup with fip-controller. Deploying nginx-ingress is easy via the addon:

$ hetzner-kube cluster addon install -n k8s nginx-ingress-controller

The addon configures the controller to hostNetwork: true so that it binds directly to the node’s network ports 80 and 443 (see here for details). This does mean that only a single ingress controller pod can run on each node (because only one service can bind to a given port). For me this is fine because I only need a single IngressController for my web-based services. You’re on your own if you want to do anything more fancy, I’m afraid.

You can verify the controller is working by doing the following for each worker:

$ curl WORKER.1.IP.ADDRESS
default backend - 404.

If you add all the worker IPs to your DNS records everything should work hunky-dory. Should a worker fall over, however, some requests will get routed to a failed node. One solution would be to manage your DNS records programmatically from within the cluster using AWS’s Route 53 or some such.

Instead let’s use the floating IP set up earlier. A floating IP can be assigned to a node in the Hetzner Cloud interface. If that node then fails requests to the floating IP will fail. So, we use fip-controller to continually check that the floating IP is assigned to a working node. Get the yaml files from here and apply:

$ kubectl create namespace fip-controller
$ kubectl create -f hcloud-fip-controller/rbac.yaml
$ kubectl create -f hcloud-fip-controller/deploy/daemonset.yaml

Then apply the configuration. The controller needs the floating IP(s) and your Hetzner Cloud API token:

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: fip-controller-config
  namespace: fip-controller
data:
  config.json: |
    {
      "hcloud_floating_ips": [
        "YOUR.FLOATING.IP.HERE"
      ]
    }
---
apiVersion: v1
kind: Secret
metadata:
  name: fip-controller-secrets
  namespace: fip-controller
stringData:
  HCLOUD_API_TOKEN: YOUR_TOKEN_HERE

Your floating IP should now gracefully move node if you shutdown the currently-assigned node.

I have configured my DNS records to all point to the floating IP. The request goes to the floating IP and gets routed to the assigned node, where it is handled by nginx-ingress. New worker nodes are handled automatically as long as they have the cloud-init config, seen earlier, that sets up the floating IP. A downside is that fip-controller only polls and makes changes every 30s so there could theoretically be up to 30s when the floating IP is assigned to a node that’s just failed.

Note: remember that you need to create an Ingress resource before you can actually reach any cluster services. Mine looks a bit like this, with some extraneous details cut out:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx # IMPORTANT
  labels:
    # ...
  name: ingress
spec:
  rules:
  - host: roll.of.thunder.com
    http:
      paths:
      - backend:
          serviceName: thunder
          servicePort: http
        path: /
  - host: hear.my.cry.com
    http:
      paths:
      - backend:
          serviceName: cry
          servicePort: http
        path: /

It’s the annotation that links it to nginx-ingress. Whenever you deploy a new service, just edit the Ingress to add a new host and off you go!

Another school of thought (see Reddit discussion here) says that you should create a different Ingress for each service. If you’re using nginx-ingress an Ingress resource is really only a bit of configuration for nginx. It doesn’t create an actual load balancer as it would in AWS or GCP. Currently I have everything in one Ingress. The limitation is that you can only refer to services in the namespace so everything ends up in the same namespace. Using multiple Ingresses would make it easier to compartmentalise things.

Storage

We need some way of creating PersistentVolumes. The easy option is to use hetzner-csi so that the cluster can create PersistentVolumes out of Hetzner Cloud block storage volumes. You can skip this if you don’t need to handle stateful apps.

$ hetzner-kube cluster addon install -n k8s hetzner-csi

Creating a PersistentVolumeClaim resource will now create a volume. Try it by applying the following (from here):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: hcloud-volumes
---
kind: Pod
apiVersion: v1
metadata:
  name: my-csi-app
spec:
  containers:
    - name: my-frontend
      image: busybox
      volumeMounts:
      - mountPath: "/data"
        name: my-csi-volume
      command: [ "sleep", "1000000" ]
  volumes:
    - name: my-csi-volume
      persistentVolumeClaim:
        claimName: csi-pvc

You should see a new volume pop up in the Hetzner Cloud web interface.

The advantage of hetzner-csi is that it’s very straightforward to use. There are some disadvantages. First of all the minimum volume size is 10GB. This is kind of a lot if you only need space for a small database. You might try allocating one PV and sharing it amongst many pods but, more annoyingly, the interface only supports ReadWriteOnce access and not ReadWriteMany. This means that only one pod can have a claim on a persistent volume. Be careful with deploying stateful applications. If you try to do a rolling deployment the new pod will spin up and get stuck trying to claim the PV that’s already held by the old pod. You need to use the replace strategy for deployments, which does entail some downtime.

I see OpenEBS and Rook are also popular options. They may work well. I tried deploying Rook, thinking that I could make use of the HDDs of each node, but it immediately began complaining about limited space and fell over after a few days. I’m sure it works well but I just feel safer using separate volumes.

What I’ve settled on is just creating a new 10GB volume for every stateful application. This is needlessly resource-intensive but:

  1. That ship sailed long ago, hun, we’re doing a personal Kubernetes cluster
  2. A 10GB volume only costs 0.48€ per month
  3. Complete separation makes it easier to e.g. restore one application without affecting any others.

Backups

At this point the cluster is basically ready for whatever you want to throw at it. While we’re on the theme of storage and volumes let’s get backups sorted. I store my backups in Wasabi because it’s cheap and S3-compatible (so it mostly works with tools that expect S3). The instructions below will be for Wasabi but you can substitute your provider easily.

I use Velero with Restic to handle backups. I’m slightly hazy on the distinction between Velero and Restic but I think Velero handles all the backup operations and Restic does the talking to Wasabi (please correct if you know more). Assuming you’re using Wasabi, set up an account, make a backup bucket and create an access key for Velero. You need to save the key ID and secret in a file in S3 format e.g.:

# in credentials-wasabi
[default]
aws_access_key_id=MAH_KEY_ID
aws_secret_access_key=MAH_ACCESS_SECRET

Install Velero locally. Once that’s done, here’s the command I used to deploy it to the cluster:

$ velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.0.0 \
  --bucket MAH_BUCKET_NAME \
  --secret-file ./credentials-wasabi \
  --backup-location-config region=eu-central-1,s3ForcePathStyle="true",s3Url=http://s3.eu-central-1.wasabisys.com/ \
  --snapshot-location-config region=eu-central-1 \
  --use-restic

You might need to change the region depending on how you set up your bucket. I don’t have versioning, logging or public access enabled on my bucket.

Velero will set up a whole load of stuff. Once it’s done you can set up a backup schedule:

$ velero create schedule 3hourbackup --schedule="@every 3h" --ttl 2160h0m0s --include-cluster-resources

This schedules a backup every three hours and keeps it for three months (2160 hours). It would be better, I think, to do a backup per namespace (there are flags to filter the resources that are backed up) but I have poor naming practices and have put most things into the default namespace so I’m just pulling everything.

A backup is no use if you don’t check that restore works, as GitLab found out. Restoring from a backup is easy:

$ velero backup get
# Choose your backup and...
$ velero restore create --from-backup BACKUP_NAME

Velero seems to be designed more for when something gets totally broken and is removed from the cluster. If the resource already exists the restore won’t change anything. This is handy if you’re like me and have shoved everything into one backup. I think of it as like running kubectl apply.

This DigitalOcean article has really good step-by-step instructions for setting up and testing Velero if you get stuck.

Note: Velero by default doesn’t include persistent volumes in its backups. If you want them included you need to annotate the pod with a claim on the PV. The syntax is as follows:

backup.velero.io/backup-volumes: volume-name-1,volume-name-2

You can use kubectl edit on an existing deployment or pass Helm the options. For example if you want to deploy Prometheus here’s how you add the annotations (determined by delving through the chart documentation):

$ helm install prometheus stable/prometheus \
     --set server.podAnnotations=backup.velero.io/backup-volumes:storage-volume,alertmanager.podAnnotations=backup.velero.io/backup-volumes:storage-volume,server.strategy.type=Recreate

storage-volume and alertmanager are the names used within the Promethus configuration to refer to the claims. Note too that we specify a Recreate strategy due to the RWO access mode limitation (see the storage section).

TLS certificates

By now your cluster has everything you need to deploy applications. Once apps have their DNS records configured to point to the cluster you can set up TLS! hetzner-kube has a cert-manager addon but it seems to be broken upstream. So, following the manual instructions in the linked issue:

$ kubectl create namespace cert-manager
$ kubectl apply --validate=false \
        -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.12/deploy/manifests/00-crds.yaml
$ helm repo add jetstack https://charts.jetstack.io
$ helm install cert-manager --namespace cert-manager jetstack/cert-manager

As ever, Digital Ocean has a good guide. The TL;DR is that you create a ClusterIssuer:

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-production
  namespace: cert-manager
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: YOUR_EMAIL_HERE
    privateKeySecretRef:
      name: letsencrypt-production
    solvers:
      - http01:
          ingress:
            class: nginx

And then add an annotation and tls section to your Ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-production # NEW
  labels:
    # ...
  name: ingress
spec:
  rules:
  - host: roll.of.thunder.com
    http:
      paths:
      - backend:
          serviceName: thunder
          servicePort: http
        path: /
  - host: hear.my.cry.com
    http:
      paths:
      - backend:
          serviceName: cry
          servicePort: http
        path: /
  # NEW SECTION:
  tls:
  - hosts:
    - roll.of.thunder.com
      secretName: thunder-tls
    - hear.my.cry.com
      secretName: cry-tls

cert-manager goes off and sets everything up. In a few minutes you should have HTTPS access to your hosts!

Total cost

Assuming you end up using four PVs the monthly cost (not including Wasabi) ends up at:

1 x CX21 @ 5.83€ = 5.83€

2 x CX11 @ 2.96€ = 5.92€

40GB PV @ 0.48€ per 10GB = 1.92€

1 x floating IP @ 1.19€ = 1.19€

Total: 14.86€

More expensive than my little GCP free tier FreeBSD box but not bad all considering. And what price can you put on the joy that kubectl brings into your life?

TODOs

I haven’t got as far as logging yet. I’m almost certainly going to use fluentd with Stackdriver.

In a two-worker setup if one worker fails everything piles on to the remaining worker. Would be good to remove the taint so that pods can be deployed on master if need be.

Improvements

If you try following this and run into hideous problems, or you find any grievous errors, please let me know.