Kube-router failure

This is just darling:


kube-system      kube-router-7v944                               0/1     CrashLoopBackOff   10         11h   192.168.1.172   kube02.svealiden.se   <none>           <none>
default          grafana-67d6bc9f96-lp2fk                        0/1     Running            3          11h   10.32.1.90      kube03.svealiden.se   <none>           <none>
default          pdnsadmin-deployment-b65c568dd-kd7x4            0/1     Running            8          31d   10.32.0.92      kube02.svealiden.se   <none>           <none>
kube-system      kube-router-nrz6v                               0/1     CrashLoopBackOff   10         11h   192.168.1.173   kube03.svealiden.se   <none>           <none>
kube-system      kube-router-9mmfc                               0/1     CrashLoopBackOff   10         11h   192.168.1.171   kube01.svealiden.se   <none>           <none>
default          zbxserver-b58857598-njf26                       0/1     Running            5          23d   10.32.0.90      kube02.svealiden.se   <none>           <none>
default          pdnsadmin-deployment-b65c568dd-rdtft            0/1     Running            11         11h   10.32.2.113     kube01.svealiden.se   <none>           <none>
default          pdnsadmin-deployment-b65c568dd-s2w4n            0/1     Running            5          11d   10.32.1.93      kube03.svealiden.se   <none>           <none>
default          grafana-67d6bc9f96-ws7dw                        0/1     Running            6          27d   10.32.0.89      kube02.svealiden.se   <none>           <none>

Kube-router is the connection-fabric for pods. So all instances being down is suboptimal. Turns out the file that kube-router needs to connect to Kubernetes couldn’t be found:

[root@kube01 ~]# mkctl logs -f kube-router-lrtxp -n kube-system
I1126 07:44:26.337591       1 version.go:21] Running /usr/local/bin/kube-router version v1.3.2, built on 2021-11-03T18:24:15+0000, go1.16.7
Failed to parse kube-router config: Failed to build configuration from CLI: stat /var/lib/kube-router/client.config: no such file or directory

This was a surprise to me since I hadn’t changed any config. I know because I was asleep! None of this is critical stuff so it’s no biggie but I get kind of curious. Was this a microk8s-thing or a Kubernetes-thing happening? I suspect it’s a microk8s-thing having to do with the path mounted to /var/lib/kube-router/ referencing a specific snap-version of microk8s. Not that I upgraded it while asleep – admittedly – but seems more likely than Kubernetes fiddling with a deployment configuration randomly.

Anyway… Think I’m going to get myself acquainted with Nomad and Consul for a while…

Addendum: Kubernetes is back up and running by the way. I just had to run mkctl edit ds kube-router -n kube-system a couple of times and fiddle some values back and forth.

Kubernetes and clusters in general

I’ve hated Kubernetes for a long time. Must be nigh on seven years at this point. I got in pretty early when Traefik wasn’t around and it was generally not viable to run a Kubernetes cluster outside the cloud due to the lack of LoadBalancer implementations. StatefulSets weren’t a thing. But everyone else seems to have been crazy for it.

Maybe what made me hostile to Kubernetes was its way of pawning off all the difficult parts of clustering to someone else. When you don’t have to deal with state and mutual exclusion, clustering becomes way easier. But that’s not really solving the underlying problem, just saying “If you figured out the hard parts, feel free to use Kubernetes to simplify the other parts”.

It also doesn’t work in its favor that it is so complex and obtuse. I’ve spent years tinkering with it(on and off naturally, I don’t engage that frequently with things I hate) and over the past few weeks I’ve got a working setup using microk8s and Rook that lets me create persistent volumes in my external Ceph cluster running on Proxmox. I now run my web UI for the pdns authoritative DNS servers in that cluster using a Deployment that can be scaled quite easily:

[root@kube01 ~]# mkctl get pods -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP              NODE                  NOMINATED NODE   READINESS GATES
traefik-ingress-controller-gmf7k        1/1     Running   1          20h   192.168.1.172   kube02.svealiden.se   <none>           <none>
traefik-ingress-controller-94clq        1/1     Running   1          20h   192.168.1.173   kube03.svealiden.se   <none>           <none>
traefik-ingress-controller-77fxr        1/1     Running   1          20h   192.168.1.171   kube01.svealiden.se   <none>           <none>
whoami-78447d957f-t82sd                 1/1     Running   1          20h   10.1.154.29     kube01.svealiden.se   <none>           <none>
whoami-78447d957f-bwg7p                 1/1     Running   1          20h   10.1.154.30     kube01.svealiden.se   <none>           <none>
pdnsadmin-deployment-856dcfdfd8-5d45p   1/1     Running   0          50s   10.1.173.137    kube02.svealiden.se   <none>           <none>
pdnsadmin-deployment-856dcfdfd8-rdv88   1/1     Running   0          50s   10.1.246.237    kube03.svealiden.se   <none>           <none>
pdnsadmin-deployment-856dcfdfd8-vmzws   1/1     Running   0          50s   10.1.154.44     kube01.svealiden.se   <none>           <none>

[root@kube01 ~]# mkctl scale deployment.v1.apps/pdnsgui-deployment --replicas=2
Error from server (NotFound): deployments.apps "pdnsgui-deployment" not found

[root@kube01 ~]# mkctl get deployments
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
whoami                 2/2     2            2           20h
pdnsadmin-deployment   3/3     3            3           2m23s

[root@kube01 ~]# mkctl scale deployment.v1.apps/pdnsadmin-deployment --replicas=2
deployment.apps/pdnsadmin-deployment scaled

[root@kube01 ~]# mkctl get pods -o wide
NAME                                    READY   STATUS        RESTARTS   AGE     IP              NODE                  NOMINATED NODE   READINESS GATES
traefik-ingress-controller-gmf7k        1/1     Running       1          20h     192.168.1.172   kube02.svealiden.se   <none>           <none>
traefik-ingress-controller-94clq        1/1     Running       1          20h     192.168.1.173   kube03.svealiden.se   <none>           <none>
traefik-ingress-controller-77fxr        1/1     Running       1          20h     192.168.1.171   kube01.svealiden.se   <none>           <none>
whoami-78447d957f-t82sd                 1/1     Running       1          20h     10.1.154.29     kube01.svealiden.se   <none>           <none>
whoami-78447d957f-bwg7p                 1/1     Running       1          20h     10.1.154.30     kube01.svealiden.se   <none>           <none>
pdnsadmin-deployment-856dcfdfd8-rdv88   1/1     Running       0          2m33s   10.1.246.237    kube03.svealiden.se   <none>           <none>
pdnsadmin-deployment-856dcfdfd8-vmzws   1/1     Running       0          2m33s   10.1.154.44     kube01.svealiden.se   <none>           <none>
pdnsadmin-deployment-856dcfdfd8-5d45p   0/1     Terminating   0          2m33s   10.1.173.137    kube02.svealiden.se   <none>           <none>

[root@kube01 ~]# mkctl get pods -o wide
NAME                                    READY   STATUS    RESTARTS   AGE     IP              NODE                  NOMINATED NODE   READINESS GATES
traefik-ingress-controller-gmf7k        1/1     Running   1          20h     192.168.1.172   kube02.svealiden.se   <none>           <none>
traefik-ingress-controller-94clq        1/1     Running   1          20h     192.168.1.173   kube03.svealiden.se   <none>           <none>
traefik-ingress-controller-77fxr        1/1     Running   1          20h     192.168.1.171   kube01.svealiden.se   <none>           <none>
whoami-78447d957f-t82sd                 1/1     Running   1          20h     10.1.154.29     kube01.svealiden.se   <none>           <none>
whoami-78447d957f-bwg7p                 1/1     Running   1          20h     10.1.154.30     kube01.svealiden.se   <none>           <none>
pdnsadmin-deployment-856dcfdfd8-rdv88   1/1     Running   0          2m39s   10.1.246.237    kube03.svealiden.se   <none>           <none>
pdnsadmin-deployment-856dcfdfd8-vmzws   1/1     Running   0          2m39s   10.1.154.44     kube01.svealiden.se   <none>           <none>

Yet I have only the flimsiest idea of how it works and how to fix it if something breaks. Maybe I’ll learn how Calico uses vxlan to connect everything magically and why I had to reset my entire cluster on thursday to remove a custom resource definition.

By the way, try to create a custom resource definition not using the beta-API! You’ll have to provide a schema: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/

I suspect many people will go mad trying to make heads or tails of that. Anyway, if I had to make a decision about how to run a set of microservices in production I’d still go with something like docker containers run as systemd-services and HAProxy to load balance the traffic. Less automation for rolling upgrades, scaling and so on but I wouldn’t be worried about relying entirely on a system where it’s not even clear if I can find a way to describe the malfunction that keeps all services from running! I mean, I added podAntiAffinity to me deployment before:

affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - pdns-admin-gui
            topologyKey: "kubernetes.io/hostname"

And this is what all three pods logged when I started the deployment:

  Warning  FailedScheduling  39m   default-scheduler  0/3 nodes are available: 3 node(s) didn't match pod affinity/anti-affinity rules, 3 node(s) didn't match pod anti-affinity rules.
  Warning  FailedScheduling  39m   default-scheduler  0/3 nodes are available: 3 node(s) didn't match pod affinity/anti-affinity rules, 3 node(s) didn't match pod anti-affinity rules.
  Normal   Scheduled         39m   default-scheduler  Successfully assigned default/pdnsadmin-deployment-856dcfdfd8-vmzws to kube01.svealiden.se
  Normal   Pulled            39m   kubelet            Container image "cacher.svealiden.se:5000/pdnsadmin:20210925" already present on machine
  Normal   Created           39m   kubelet            Created container pdnsadmin
  Normal   Started           39m   kubelet            Started container pdnsadmin

Obviously “Successfully assigned default/pdnsadmin-deployment-856dcfdfd8-vmzws to kube01.svealiden.se” was a good thing but Kubernetes telling me that zero nodes were initially available? When the only requirement I set out was that pdns-admin-gui pods shouldn’t be run on the same node? And there are three nodes? And I asked for three pods? That’s the sort of stuff that would make me very nervous if this was used for production. What if Kubernetes gets stuck in the “All pods forbidden because reasons”-mode?

This is also why I’m terrified of running Ceph for production applications. Multiple independent Ceph cluster? Okey, now we’re talking but a single cluster? It’s just a matter of time before Ceph locks up on you and you have a week’s downtime while trying to figure out what the hell is going on.

The keen observer will ask “Aren’t you a massive fan of clusters?” and that’s entirely correct. I’ve run Ceph for my own applications for just over two years and have Elasticsearch, MongoDB and MariaDB clusters set up. But the key point is disaster recovery. Clusters can be great for high availability which is what I’m really a fan of but clusters where there isn’t an override in case the complex logic of the distributed system goes on the fritz are a huge gamble. If MongoDB gets confused I can choose a node and force it to become the primary. If I can get the others to join afterwards that’s fine, otherwise I’ll just have to blank them and rebuild. Same with MariaDB, I can kill the two nodes and make the remaining one a master and take it from there. I don’t need any distributed algorithm to give me permission to bring systems back in a diminished capacity.

By the way, nothing essential runs on my Ceph cluster. Recursive DNS servers, file-shares, backup of file-shares and so on are all running on local storage in a multimaster or master/slave configuration. Ceph going down will disable some convience-functions, my Zabbix monitoring server, Prometheus, Grafana and so on, but I can live without them for a couple of hours while I swear angrily at Ceph. In fairness I haven’t had a serious Ceph-issue now for (checking…) about a year now!

containerd doesn’t support pull-through-cache

This means that microk8s(which is awesome!) doesn’t support fetching data through pull-through-caches. I’ve read that later versions of containerd does support it but that’s not what microk8s is running in that case. Oh well, now I’ve set my private registry to not be a pull-through-cache because it can’t be that and an ordinary private registry. So now I pull images, tag them with my local registry as the source and push them there:

cjp@amd:~$ docker tag nginx cacher.svealiden.se:5000/nginx:20210925
cjp@amd:~$ docker push cacher.svealiden.se:5000/nginx:20210925
The push refers to repository [cacher.svealiden.se:5000/nginx]
fac15b2caa0c: Pushed
f8bf5746ac5a: Pushed
d11eedadbd34: Pushed
797e583d8c50: Pushed
bf9ce92e8516: Pushed
d000633a5681: Mounted from redis
20210925: digest: sha256:6fe11397c34b973f3c957f0da22b09b7f11a4802e1db47aef54c29e2813cc125 size: 157

Then microk8s can pull them. Maybe this would have gone more quickly without that drink three hours ago?