Lots of unnecessarily complicated tech

2019-07-09

Home Lab – v1

Filed under: Uncategorized — admin @ 19:46

I like high availability, probably more than I should. Banking, phone systems and the electrical grid do a good job with this but we have a lot of complex stuff at home nowadays which makes it trickier to keep them up and running reliably. It was more than ten years ago that I first tried to build a server setup that could keep things up and running even when individual servers failed or had to be brought down for maintenance. I didn’t have enough hardware with virtualization support so I had to use Xen(32-bit edition) with its paravirtualization support.

I used DRBD for shared storage and OCFS2 I think on top of that. It worked not so well. For various reasons I ended up having a single server with Solaris 11 and zRaid-5 storage so at least I had data redundancy even if the system as a whole wasn’t replicated. I later ended up with a master-slave setup with two identical Core i5-based “servers” where the master node replicated data over to the slave using Btrfs snapshots. The filesystems used Btrfs RAID1 on both nodes so there was a LOT of redundancy. There’s a story with zRaid-5, a weird hard drive failure and a couple of days of worrying behind this 2-server – each with internal 2-way replication – setup.

One of the servers in the master-slave setup gave up a few months ago so it was time to replace it with something new and now I think it’s pretty much complete.

Hardware

Node name: pve1
Microtower Atom C3558
16GB RAM
3.5″ hot swap spaces + 2 internal SATA connections
1 250 GB NVMe
1 250 GB SATA
1 500 GB SATA
1 2 TB 5400 RPM SATA
1 4 TB 5400 RPM SATA
4 Gbit LAN
IPMI

Node name: pve2
Microtower Xeon-D 1541
32 GB RAM
3.5″ hot swap spaces + 2 internal SATA connections
1 250 GB NVMe
2 250 GB SATA
1 500 GB SATA
1 2 TB 5400 RPM SATA
1 4 TB 5400 RPM SATA
Gbit LAN, 2 10GbaseT LAN
IPMI

Node name: pve3
Microtower Atom C3558
32GB RAM
3.5″ hot swap spaces + 2 internal SATA connections
1 250 GB NVMe
1 250 GB SATA
1 500 GB SATA
1 2 TB 5400 RPM SATA
1 4 TB 5400 RPM SATA
Gbit LAN
IPMI

Node name: nearline
Self-assembled Core i5-system
16GB RAM
Tandberg LTO3 tape drive
Hodgepodge of hard drives

Networking equipment:
Cisco RV082 Router
Netgear LB2120( for backup internet connection )
HPE 1920S switch

The tiny little screen to the top right is sort of a crude monitoring system. I don’t much feel like running a big LCD screen just to show the load on my servers. So this tiny screen tells me if any important hosts are down and what the load is on each Proxmox machine.

Structure

The nodes pve1-pve3 form a Proxmox 5.4 cluster with Ceph Luminous as shared storage. The journals for all Ceph OSDs is stored on NVMe partitions which took a while to set up since Proxmox’ own Ceph tools don’t want to do that. They say they do, but they don’t.

Some storage is kept out of Ceph because of reliability reasons. Basically I think of Ceph as a single point of (unlikely) failure. So virtual machines I want running even when I try to figure out why Ceph refuses to work are stored on LVM-thin volumes.

The nearline machine(not shown below since it is mostly turned off and so it makes little sense to monitor) is also a manifestation of my distrust in Ceph. I used to rsync data over to Btrfs-volumes once a day that I then snapshot. But the drives I put into the machine were junk so that had to stop. Now I got Bacula up and running again so therefor store backups on my cherished LTO-tapes. *hugs*

The Proxmox nodes use bonded network ports to connect to the HP Enterprise switch that serves mainly to connect the cluster together but it’s also the core switch of the network. The HP switch connects to my good old Cisco RV082 router which in turn connects to the fiber-modem that gives a nice 100 Mbit connection out. Now it also has 4G modem connected to WAN2 as a fallback.

The nodes with green links to the cloud symbol are stored on Ceph so can be migrated from physical hosts while running. Some nodes are not shown in the graphic above. Mostly they’re testbeds like my CloudLinux install with WHM and cPanel, a copy of my pacemaker-cluster and so on.

Software

Proxmox

A Debian-based virtualization platform with cluster and HA-support? Yes please… Has a great GUI and integration with Ceph. Kind of a pain to install new SSL certificates but it can be done. Basically Proxmox is an alternative to VMware vSphere and whatever Xen offers nowadays. Wish it had a way of configuring fencing for cluster nodes integrated with its own HA functionality. Has pretty good built-in performance monitoring as well.

Ceph

So I took the plunge to start using Ceph. It wasn’t entirely easy since I already had my cluster set up to use GlusterFS. It’s great to be able to move virtual machines from node to node using live migration but you can’t do that between separate shared storage systems now can you? I can handle some downtime but thought of it as an interesting experiment. Since I had DNS servers and MySQL servers set up in a cluster of virtual machines those virtual machines could be shut down one at a time, recreated on a new cluster with Ceph as a backend and then the process could be repeated one physical node at a time.

I didn’t need to create a new cluster but I figured I might as well go the entire way and upgrade to Proxmox 5.4. All in all there was like 5-10 minutes of down time inherent in the move and 1-2 hours of downtime because I’m a klutz who configures two servers to use the same IP address and then wonder why things don’t work so well.

Many decisions made about this setup reflect my lack of trust in Ceph but by now I’ve actually come to trust it quite well. It performed remarkably well when I screwed up the IP-addresses and other things. Haven’t encountered a split brain situation yet, which is more than I can say for GlusterFS(note: when increasing the node count in a GlusterFS setup, you have to change quorum levels manually…).

Ceph also has its own monitoring system which is nice.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress