Ceph quorom shenanigans

God I love the word “shenanigans”. I can’t tell you how many files on my work laptop include that word.

Anway… So I moved some VMs around to upgrade pve2 to get rid of an annoying issue with UEFI where the console won’t work. But I overloaded pve1 which promptly went and died. Well, rebooted… And then mon.pve1 couldn’t join the cluster again. I figured I could take that opportunity to upgrade Ceph and the whole pve1-server actually.

No dice. Okey, delete mon.pve1 and add again? No dice. Proxmox’ Ceph tools don’t like things being wonky. After (looking at the time) an hour and a half? Roughly I’ve deleted and readded mon.pve1 several times and finally got it to work by also running “ceph mon add pve1 192.168.1.21” on one of the quorate nodes.

That might seem obvious but can you find that in the guide for adding and removing mons? https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/

I can’t. Also, why is it ceph mon add pve1 192.168.1.21 and not ceph mon add mon.pve1 192.168.1.21? The mon-part is included everywhere else. But this is why I run Ceph at home. To learn this stuff when it’s only me getting annoyed. My bosses are running Ceph in production the poor dears. Obviously not on my rinky-dink setup crammed into a cupboard but still… Nerve-wracking stuff.

Well, I guess I’d better get to upgrading pve2 then… Slightly behind schedule.