I’ve hated Kubernetes for a long time. Must be nigh on seven years at this point. I got in pretty early when Traefik wasn’t around and it was generally not viable to run a Kubernetes cluster outside the cloud due to the lack of LoadBalancer implementations. StatefulSets weren’t a thing. But everyone else seems to have been crazy for it.
Maybe what made me hostile to Kubernetes was its way of pawning off all the difficult parts of clustering to someone else. When you don’t have to deal with state and mutual exclusion, clustering becomes way easier. But that’s not really solving the underlying problem, just saying “If you figured out the hard parts, feel free to use Kubernetes to simplify the other parts”.
It also doesn’t work in its favor that it is so complex and obtuse. I’ve spent years tinkering with it(on and off naturally, I don’t engage that frequently with things I hate) and over the past few weeks I’ve got a working setup using microk8s and Rook that lets me create persistent volumes in my external Ceph cluster running on Proxmox. I now run my web UI for the pdns authoritative DNS servers in that cluster using a Deployment that can be scaled quite easily:
[root@kube01 ~]# mkctl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
traefik-ingress-controller-gmf7k 1/1 Running 1 20h 192.168.1.172 kube02.svealiden.se <none> <none>
traefik-ingress-controller-94clq 1/1 Running 1 20h 192.168.1.173 kube03.svealiden.se <none> <none>
traefik-ingress-controller-77fxr 1/1 Running 1 20h 192.168.1.171 kube01.svealiden.se <none> <none>
whoami-78447d957f-t82sd 1/1 Running 1 20h 10.1.154.29 kube01.svealiden.se <none> <none>
whoami-78447d957f-bwg7p 1/1 Running 1 20h 10.1.154.30 kube01.svealiden.se <none> <none>
pdnsadmin-deployment-856dcfdfd8-5d45p 1/1 Running 0 50s 10.1.173.137 kube02.svealiden.se <none> <none>
pdnsadmin-deployment-856dcfdfd8-rdv88 1/1 Running 0 50s 10.1.246.237 kube03.svealiden.se <none> <none>
pdnsadmin-deployment-856dcfdfd8-vmzws 1/1 Running 0 50s 10.1.154.44 kube01.svealiden.se <none> <none>
[root@kube01 ~]# mkctl scale deployment.v1.apps/pdnsgui-deployment --replicas=2
Error from server (NotFound): deployments.apps "pdnsgui-deployment" not found
[root@kube01 ~]# mkctl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
whoami 2/2 2 2 20h
pdnsadmin-deployment 3/3 3 3 2m23s
[root@kube01 ~]# mkctl scale deployment.v1.apps/pdnsadmin-deployment --replicas=2
deployment.apps/pdnsadmin-deployment scaled
[root@kube01 ~]# mkctl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
traefik-ingress-controller-gmf7k 1/1 Running 1 20h 192.168.1.172 kube02.svealiden.se <none> <none>
traefik-ingress-controller-94clq 1/1 Running 1 20h 192.168.1.173 kube03.svealiden.se <none> <none>
traefik-ingress-controller-77fxr 1/1 Running 1 20h 192.168.1.171 kube01.svealiden.se <none> <none>
whoami-78447d957f-t82sd 1/1 Running 1 20h 10.1.154.29 kube01.svealiden.se <none> <none>
whoami-78447d957f-bwg7p 1/1 Running 1 20h 10.1.154.30 kube01.svealiden.se <none> <none>
pdnsadmin-deployment-856dcfdfd8-rdv88 1/1 Running 0 2m33s 10.1.246.237 kube03.svealiden.se <none> <none>
pdnsadmin-deployment-856dcfdfd8-vmzws 1/1 Running 0 2m33s 10.1.154.44 kube01.svealiden.se <none> <none>
pdnsadmin-deployment-856dcfdfd8-5d45p 0/1 Terminating 0 2m33s 10.1.173.137 kube02.svealiden.se <none> <none>
[root@kube01 ~]# mkctl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
traefik-ingress-controller-gmf7k 1/1 Running 1 20h 192.168.1.172 kube02.svealiden.se <none> <none>
traefik-ingress-controller-94clq 1/1 Running 1 20h 192.168.1.173 kube03.svealiden.se <none> <none>
traefik-ingress-controller-77fxr 1/1 Running 1 20h 192.168.1.171 kube01.svealiden.se <none> <none>
whoami-78447d957f-t82sd 1/1 Running 1 20h 10.1.154.29 kube01.svealiden.se <none> <none>
whoami-78447d957f-bwg7p 1/1 Running 1 20h 10.1.154.30 kube01.svealiden.se <none> <none>
pdnsadmin-deployment-856dcfdfd8-rdv88 1/1 Running 0 2m39s 10.1.246.237 kube03.svealiden.se <none> <none>
pdnsadmin-deployment-856dcfdfd8-vmzws 1/1 Running 0 2m39s 10.1.154.44 kube01.svealiden.se <none> <none>
Yet I have only the flimsiest idea of how it works and how to fix it if something breaks. Maybe I’ll learn how Calico uses vxlan to connect everything magically and why I had to reset my entire cluster on thursday to remove a custom resource definition.
By the way, try to create a custom resource definition not using the beta-API! You’ll have to provide a schema: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/
I suspect many people will go mad trying to make heads or tails of that. Anyway, if I had to make a decision about how to run a set of microservices in production I’d still go with something like docker containers run as systemd-services and HAProxy to load balance the traffic. Less automation for rolling upgrades, scaling and so on but I wouldn’t be worried about relying entirely on a system where it’s not even clear if I can find a way to describe the malfunction that keeps all services from running! I mean, I added podAntiAffinity to me deployment before:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pdns-admin-gui
topologyKey: "kubernetes.io/hostname"
And this is what all three pods logged when I started the deployment:
Warning FailedScheduling 39m default-scheduler 0/3 nodes are available: 3 node(s) didn't match pod affinity/anti-affinity rules, 3 node(s) didn't match pod anti-affinity rules.
Warning FailedScheduling 39m default-scheduler 0/3 nodes are available: 3 node(s) didn't match pod affinity/anti-affinity rules, 3 node(s) didn't match pod anti-affinity rules.
Normal Scheduled 39m default-scheduler Successfully assigned default/pdnsadmin-deployment-856dcfdfd8-vmzws to kube01.svealiden.se
Normal Pulled 39m kubelet Container image "cacher.svealiden.se:5000/pdnsadmin:20210925" already present on machine
Normal Created 39m kubelet Created container pdnsadmin
Normal Started 39m kubelet Started container pdnsadmin
Obviously “Successfully assigned default/pdnsadmin-deployment-856dcfdfd8-vmzws to kube01.svealiden.se” was a good thing but Kubernetes telling me that zero nodes were initially available? When the only requirement I set out was that pdns-admin-gui pods shouldn’t be run on the same node? And there are three nodes? And I asked for three pods? That’s the sort of stuff that would make me very nervous if this was used for production. What if Kubernetes gets stuck in the “All pods forbidden because reasons”-mode?
This is also why I’m terrified of running Ceph for production applications. Multiple independent Ceph cluster? Okey, now we’re talking but a single cluster? It’s just a matter of time before Ceph locks up on you and you have a week’s downtime while trying to figure out what the hell is going on.
The keen observer will ask “Aren’t you a massive fan of clusters?” and that’s entirely correct. I’ve run Ceph for my own applications for just over two years and have Elasticsearch, MongoDB and MariaDB clusters set up. But the key point is disaster recovery. Clusters can be great for high availability which is what I’m really a fan of but clusters where there isn’t an override in case the complex logic of the distributed system goes on the fritz are a huge gamble. If MongoDB gets confused I can choose a node and force it to become the primary. If I can get the others to join afterwards that’s fine, otherwise I’ll just have to blank them and rebuild. Same with MariaDB, I can kill the two nodes and make the remaining one a master and take it from there. I don’t need any distributed algorithm to give me permission to bring systems back in a diminished capacity.
By the way, nothing essential runs on my Ceph cluster. Recursive DNS servers, file-shares, backup of file-shares and so on are all running on local storage in a multimaster or master/slave configuration. Ceph going down will disable some convience-functions, my Zabbix monitoring server, Prometheus, Grafana and so on, but I can live without them for a couple of hours while I swear angrily at Ceph. In fairness I haven’t had a serious Ceph-issue now for (checking…) about a year now!