Offsite backup solution

I want an offsite backup solution, specifically one that can upload to Amazon S3 so that I can put stuff in Amazon Glacier using that Life cycle management tool. So Duplicati? Uhm… It’s built on C# for CLR ported to Linux and… Just look at the dependencies when installing it! So that’s not happening. I tried but it filled up my disk with cached files and things were just terrible.

So maybe Bacula? Would have been great but I couldn’t quite set it up. There are commercial tools but that’s not my style. I wrote a Python script but that fell apart. Let’s not get into the details but the fact that I can’t write proper software without Java-style constraints and type checking may have played a part.

So I rewrote it using Java and Hibernate to be able to store the data in an SQL database. Because that’s my style. It leaves something to be desired in terms off efficiency. See if you can spot the moment where I started two consecutive runs of the offsite backup software.

And now I got this:

Exception in thread “main” java.lang.OutOfMemoryError: Java heap space

Oh, ffs… I ran the VM with 1 GB of RAM. That probably wasn’t a good move. 3 GB should be okey though. Yes! That worked.

So the neat thing about this solution is that it has great disaster recovery properties. All you need to get your data back is access to the files stored in Glacier, the encryption password and the tar and gpg utilities. No custom storage formats. There’s a file in each tar.gz that lists which files that had been deleted between the current backup run and the one previously. So it’s not hard to bring back the data to exactly the state it was in when a backup ran(deletions included) or to recover every single file that has ever been present in the backups.

Downsides? Well, making a MySQL instance struggle to keep up with all the queries and inserts is not a plus. It stores data on an SSD after all. And the fact that the code is a horrendous mess also has some drawbacks. There’s no facility for compaction of snapshots… Well, it’s not for everyone.

One additional benefit though is that it is very straight forward to verify that chunks of backups can be restored to exactly the format they had when indexed. Because there’s a hashsum stored for each file for every indexing run(new hashsums are only generated for new and modified files or my computers would run hot all day long). I’ll have to get around to implementing that verification feature one of these days.