Disaster recovery for small companies

Disaster recovery is a huge topic and if you’re General Electric or even a huge successful company then it’s really complicated. On the one hand you have so much data in so many places that needs to be preserved but just telling people to put in on their personal Dropbox is neither reliable nor secure. It’s a bit easier for smaller businesses since they have one web site and ten laptops to consider.

This is going to focus on the internet-related part, so web sites and email. For the love of God, please make backups of your laptops as well. It’s just not the point of this… well, whatever this is. Windows has good built-in backups nowadays so you only need a NAS and that’ll set you back like a few hundred Kajiggers even if you buy a pretty good one with multiple disks.

What I describe are tools which are readily available but not necessarily easy to use. One-click solutions to Disaster Recovery are expensive and often sketchy*. Unless you know how it works and why it will work even in a disaster, don’t trust it.

* Wow! I see Cloudflare has stopped offering a 1000% uptime guarantee for Enterprise Plans. Now it’s 100% with a 25x penalty-thing. I’m almost sad to see it go, it was such a great example of promising things that you know you can’t deliver but which you know you can afford to pay penalties for.

Web sites

So how to safeguard a web site? If you have your web site with a hosting company then they probably make backups but you had best check that. Don’t just ask them if they make backups. Test the backups. Download them and see if you can get your website up and running somewhere else. This serves multiple purposes:

  • Verifies that backup are in fact made.
  • Verifies that backups are available.
  • Makes sure you know what to do with them in an emergency.

Note that you shouldn’t rely entirely on the backups made by your hosting company even if everything checks out. Disaster recovery isn’t about surviving a broken hard drive somewhere, that’s just your average monday. A company being unable to get their entire storage system to work properly for weeks is where we need DR. There are quite a few companies in Sweden as I’m writing this that have been waiting to access their email accounts for weeks. It took that provider a week just to get their customers back to the point where they could send and receive emails again.

My personal favorites are legal problems. Like a group of people not bothering to transfer assets of a bankrupt company(by buying it from the estate) but instead keep operating them saying simply “Hey, we got all the login credentials we need to run this stuff. Why would be go all out of our way to make sure it’s owned by the right legal entity?”. Yeah, that doesn’t work out too well for their customers in the end. If you are told that the next step for you to get your ecommerce site back online is to contact the retired lawyer who precided over a bankruptcy case several years ago, you’re looking at a serious piece of downtime.

So don’t just have backups with the same company that hosts your web site. Keep stuff off site as well. If you get FTP or SSH access to your hosting account then you could script it, as demonstrated in this post that shows how to get all the data necessary to run a web site over from a hosting account to a virtual server so that downtime can be minutes rather than 17 weeks of legal wrangling.

If you don’t want to spend money on a virtual server 365 days a year just to have a backup that you can switch over to super-fast then downloading the information locally and making sure your know roughly how to bring a site back online is a way to go. Disaster recovery can be allowed to take time because disasters are not expected to happen often. It’s about getting things back in less than weeks or months or about getting things back at all.

Email

The IMAP protocol is great as it allows us to access email from multiple devices simultaneously. If configured correctly you can even see which emails have been sent from an account on a smart phone even when accessed from a different device(folder mapping can be kind of wonky out of the box though). It is however very much an online system, IMAP. If the email server goes down you may not even be able to read the old emails in your inbox. That depends on how much your email client is storing locally.

What can really be a kick in the pants when there’s a big failure for an email provider is that if your email clients can connect to the server using the IMAP protocol and the server is empty, IMAP will dutifully empty your inbox. Yeah, that’s not great but would your rather have old copies of deleted emails lying around on various devices just so that you can avoid this corner-case in a disaster recovery situation?

Anyway, what I do is use Imapsync to keep a set of local copies of important email accounts. Note that I have Google Workspace as my email provider and that I don’t trust them enough with my completely unimportant emails to let them be the only safeguard against data loss. Of course Imapsync is kind of not-so-user-friendly. It’s meant for situations where you need to move thousands of email accounts between providers or when making automated backups.

I run this once a day:

/usr/bin/imapsync --host1 imap.gmail.com --user1 my-gmail-address --password1 SUPERSECRETAPPPASSWORD --host2 localhost --user2 myemailbackup --password2 SUPERSECRETLOCALPASSWORD --maxbytespersecond 300000 --gmail1

Since I have a virtual machine doing backup tasks generally I installed Dovecot on it so as to not have to pay someone to hold my email backups that are just there for disaster recovery situations. On Ubuntu it’s really easy:

apt install dovecot-imapd

Then create local users that will serve as destination email accounts:

useradd myemailbackup
useradd anotherbackup

I don’t like mbox or mdbox much, at least not for these applications and that was the one change I had to make to the default dovecot setup. Maildir is my kind of format:

mail_location = maildir:~/Maildir:LAYOUT=fs

Dovecot uses the local user’s home directory by default and accepts whatever password the user has for logging on to the computer.

Knocking it up a notch

I can’t get by without Btrfs snapshots. Like in the case of email backups for instance. The maildir folders mentioned above? Stored on a Btrfs partition on my backup server.

root@backup:~# ls -lh /srv/storage/Snapshots/myemailbackup/
drwxr-xr-x 1 myemailbackup myemailbackup 284 Dec 16 23:40 myemailbackup@auto-2020-12-16-2352_14d
drwxr-xr-x 1 myemailbackup myemailbackup 284 Dec 16 23:40 myemailbackup@auto-2020-12-17-0500_14d
drwxr-xr-x 1 myemailbackup myemailbackup 324 Dec 18 01:09 myemailbackup@auto-2020-12-18-0500_14d
drwxr-xr-x 1 myemailbackup myemailbackup 324 Dec 18 01:09 myemailbackup@auto-2020-12-19-0500_14d
drwxr-xr-x 1 myemailbackup myemailbackup 324 Dec 19 14:42 myemailbackup@auto-2020-12-20-0500_14d
drwxr-xr-x 1 myemailbackup myemailbackup 324 Dec 19 14:42 myemailbackup@auto-2020-12-20-0830_4w
drwxr-xr-x 1 myemailbackup myemailbackup 324 Dec 19 14:42 myemailbackup@auto-2020-12-21-0500_14d
drwxr-xr-x 1 myemailbackup myemailbackup 324 Dec 19 14:42 myemailbackup@auto-2020-12-22-0500_14d
drwxr-xr-x 1 myemailbackup myemailbackup 324 Dec 19 14:42 myemailbackup@auto-2020-12-23-0500_14d
drwxr-xr-x 1 myemailbackup myemailbackup 324 Dec 19 14:42 myemailbackup@auto-2020-12-24-0500_14d
drwxr-xr-x 1 myemailbackup myemailbackup 324 Dec 19 14:42 myemailbackup@auto-2020-12-25-0500_14d
drwxr-xr-x 1 myemailbackup myemailbackup 324 Dec 19 14:42 myemailbackup@auto-2020-12-26-0500_14d

This takes care of the scenario mentioned above, where you run IMAP against a server that thinks you have no email and all your local copies are deleted. Each snapshot shows the state of each account at that point but uses only the amount of storage necessary for the difference between other snapshots. To go completely off the rails I also send these snapshots to a secondary server. Mostly because I already coded that functionality into the script and that this is just a few gigabytes of data. My script is a horror that makes programmer’s intentionally blind themselves but there are better variants: https://ownyourbits.com/2017/12/27/schedule-btrfs-snapshots-with-btrfs-snp/

In either case, you have all your emails stored locally in case of disaster striking your email provider. You can get a new email provider and Imapsync your data from your local server up to their servers. In a couple of hours you’ll be back up and running and not having to worry about how long it will take your previous provider to sort out their problems. As mentioned earlier I’ve seen that take weeks for companies that I thankfully don’t work for.

Large scale data storage

Some web sites have huge amounts of data even though the company running it is quite small. News- and fashion sites seems to be keen on lots of high resolution pictures and video. It might not be viable to store that on your own network. Or maybe you simply want a safer storage solution. Amazon S3 is pretty great for these applications. While I use my own network as a sort of offsite backup location for email hosted externally, I can’t very well be the offsite backup location for my own house. Therefore I use Amazon S3 whenever that need arises.

You can get the aws command line tool for most operating system to easily copy data to S3(and access their other services):

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
cd aws/
less install
 # <-- Read what it does
./install

You will need to add some credentials that have access to the right S3 buckets in the aws config file:

root@samba01:~# cat .aws/credentials
[default]
aws_access_key_id = SECRETID
aws_secret_access_key = SECRETKEY

Then it’s just a matter of running aws s3:

aws s3 cp localfile.tar.gz s3://mybackupbucket/Documents/

What I really like is that I can assign life-cycle rules that push data out to their Glacier storage tier after a while:

I currently have some 1,4 TB of data in Amazon S3 with the vast majority being in Glacier. This costs me around $20 a month which is a very reasonable price for offsite backups. It’s easier than tapes and offsite is obviously way better than simple nearline(term used for not immediately accessible storage). I really like tapes but they are way more of a hassle so now that I have a 100 Mbit fiber connection I mostly use that. I got the bulk of data on tape anyway which will speed up restores. The only data I have to get from S3 in case of all my servers shorting out at the same time is the last few months.

The downside of Glacier is that it takes time to restore but consider – if you will – that you put a backup of your 5 GB database into S3 every day. Which SQL-dump are you most likely to need? One from the 7 days that are immediately available or one from the X backups that take a few hours to get? Either way, you get your data back and disaster recovery is successful.

By the way, I don’t trust Amazon’s encryption. Not their server-side encrypted, not their client-side encryption. I do the encryption using tools I have installed and that have nothing to do with Amazon:

gpg -v --batch --passphrase-file path_to_passphrase_file -c filename.tar.gz

This give me a filename.tar.gz.gpg file that can be sent to S3 as is. Note that this isn’t some specific gripe I have against Amazon, it’s the same with all American companies. Well, it would apply to Russian and Chinese companies as well but for obvious reasons they aren’t on the table. My data is not super sensitive but I’d rather encrypt it than not.