Atom servers

Today I learned that the following workload is too much for an Intel C3558 Atom server with 32GB RAM:

  • 1 Ceph mon
  • 2 Ceph OSDs
  • One VM exporting 4TB storage space via CIFS
  • One VM running
    • MariaDB Galera
    • MongoDB
    • Cortex Metrics
    • Loki log management
    • Grafana
    • Keepalived
    • HAProxy
    • ProxySQL
    • Zabbix Web UI
    • PDNS recursive DNS server
    • Minio
    • Nomad
    • Consul
  • One VM exporting access to a CephFS instance via CIFS(CTDB)
  • One VM serving as a router/gateway
  • One VM running Squid and apt-cache-ng
  • One VM running a Kubernetes master node
  • One VM running a Kubernetes worker node with
    • MetalLB
    • Kube-router
    • Kubernetes dashboard
    • Kube-metrics

As can be seen above, appserver03 buckled under the strain as soon as the kubernetes master node went up on pve3. I even tried turning off cephfs03(CIFS export of CephFS using CTDB) to give pve3 some slack but it wasn’t enough. The VM appserver01 handled the master node better but when its Kubernetes worker node went up load went bananas as well. You can even see pve1’s own load go up during the same period. Unsurprisingly it’s steal that causes the issue for appserver03 as a new VM competes for access to the underlying hardware:

Steal was an even bigger problem until I changed HAProxy to send all traffic for Cortex and Loki to the most powerful server which runs a Xeon-D processor. Now Cortex and Loki on the Atom servers only handle the background tasks, not queries from Grafana or log ingestion. Unless the main server goes down, then it fails over.

If the load hadn’t affected the behavior of the actually important NAS-services I might have kept going but it did:

ctdb is the CephFS export, ceti is just an alias for smb1. Note that the response time displayed is the latest value, not the maximum value. I got latencies well above 2 seconds for a write-read-delete cycle.

Anyway… I’m just going to have to continue to run all my Kubernetes masters and workers as VMs on my workstation with a 12 core AMD Ryzen processor and 64GB RAM. Which isn’t a problem really.