A confession: I’ve never been very good about backing up my data. Yes, I’ve had one external hard drive or another for more than a decade, but my back up plan for most of that time was to drag some folders to the closest USB drive every few months, or when I was migrating to a new computer.

For example, I would literally drag and drop – using the GUI – my “code” folder from my main hard drive to a folder on my external hard drive called “back-ups”, and date it like “2020-12-01-back-up-code”. Every few months I’d make a new one, then maybe delete the oldest one. I could have used a well-written cp command to do this, definitely adding the -R flag so it works recursively. This would also serve as a “restore” command when used in the opposite direction.

This approach is simple and easy to perform and understand – you move the files you want to save in one directions, and move them in the other to restore; and you can always fall-back to an all-GUI procedure. But it has some real downsides. First, the external hard drive would need 2x, 3x, or 4x more space than whatever I was backing up. Second, each back-up would start from scratch, taking hours. And third, sometimes I could see where some files just wouldn’t be transferred over at all!

There had to be a better way! But I didn’t want to jump into a do-it-all solution. Which brought me to rsync, “an open source utility that provides fast incremental file transfer.”

A first step in the right direction: Rsync

Basically, rsync is a command line tool that does my drag-and-drop method smarter. Instead of starting from scratch with each snapshot, rsync allows you to keep updating the same snapshot incrementally. That means that if you’ve only modified or added one or two files since your last backup, you’ll only be transferring those changes.

If you’re interested, there are probably great rsync tutorials a search away. I’m not an expert, so for now let’s just use one of my rsync commands from ~/.bashrc as an illustrative example:

rsync -ar --delete /home/sschlinkert/Documents /media/sschlinkert/external_harddrive/back-ups-rsync/

This command basically sends a copy of my Documents directory to a back-up directory. It updates incrementally, so it’s much better than removing an old backup and copying over the Documents directory every time I want to do a back up. Instead, rsync looks for new and modified files and just updates those in the back-up directory. (The optional --delete flag deletes files in the back-up that are not present in the data.)

To restore data archived in this method, I think I’d just use rsync -ar in the reverse direction, since rsync compares checksums of the original data with that of the moved data to make sure everything made it over.

Downsides to this rsync approach

The big downside here is that we only ever have one “snapshot” of data to recover with at any one time. True, it’s be definition the latest snapshot, but it feels a bit scary putting all our eggs in one basket. Plus, I knew Rsync wasn’t really meant for backing up whole home directories which I increasingly I wanted to do. It also doesn’t offer any encryption, which wasn’t a big deal for me but might be for you!

This week, looking for a project, I decided to explore some more robust back-up options.

After asking Mastodon for recommendations and searching around just a bit, I decided to give restic, a “modern backup program” that encrypts back-ups by default, a try. It’s not clear to me if restic, by default, compresses your files as well as encrypts them, so if you need that you may want to look elsewhere.

(Re: use of encryption: I won’t get into my personal threat model here, for security reasons, but in general I believe encryption is good, and it costs me little to store yet another passphrase in my password manager. It’s probably a good idea to write down your restic password somewhere physically secure. Restic’s docs has a section on its threat model, if you’re interested.)

Using restic

Here’s the official user’s guide from restic, which I found pretty helpful. Since I’m just putting backups onto USB devices, I only need to follow the instructions for a “local” back up.

Installing restic on Ubuntu 20.04

sudo apt install restic

Let’s check the version really quick: restic version prints:

restic 0.9.6 compiled with go1.12.12 on linux/amd64. A little outdated – 0.12.1 is on Github as I write this – but it’ll do.

Getting set up with restic

Now let’s do some backing up!

First, let’s try to gain a bare-bones conceptual understanding of how restic works. Restic has a concept called “repositories”, which is where your backup(s) will live (so like, an external hard drive, a USB stick or a cloud service). From the docs:

The place where your backups will be saved is called a “repository”. This chapter explains how to create (“init”) such a repository. The repository can be stored locally, or on some remote server or service.

Basically we use restic to “pull” data into these backup repositories. So for me, my repository (singular for now) will be on my external hard drive (for simplicity, let’s pretend I only have one of those…). The data I want to back-up to this repository is on my laptop’s hard drive.

Initializing a restic repository

OK, let’s initialize one of these repositories. On my external hard drive, I ran:

mkdir /media/sschlinkert/external_harddrive/restic-repo
restic init --repo /media/sschlinkert/external_harddrive/restic-repo

Restic now asks us to create a password for this back-up “repo” (restic uses encryption). We’ll be asked to enter the new password a second time to confirm.

Doing our first backup

Finally, time to back-up some data. Following the docs, I composed this command using the backup subcommand to backup my entire home/ directory:

restic -r /media/sschlinkert/external_harddrive/restic-repo --verbose backup /home/sschlinkert/

We will do subsequent snapshots by running the exact same command at later times. In other words this is the command you’d run every night or week to keep the backup up-to-date.

Note that we can exclude files by name or pattern. Check the docs for more information.

Ensure a snapshot was created

We can do a quick check to see our first snapshot by running:

restic -r /media/sschlinkert/external_harddrive/restic-repo snapshots

For me, this command prints:

repository f96d340e opened successfully, password is correct
ID        Time                 Host                  Tags        Paths
----------------------------------------------------------------------------------
7ea938aa  2021-10-26 19:54:53  sschlinkert-Oryx-Pro              /home/sschlinkert
----------------------------------------------------------------------------------
1 snapshot

Check integrity of a repo

One thing that’s nice about restic is that you can check the state or “health” of the backup.

restic -r /media/sschlinkert/external_harddrive/restic-repo check

which should output a block of text that should end with: no errors were found. Awesome!

Now try restoring our data!

Now let’s say something bad has happened and we need to restore our files from this back-up repo.

Our files aren’t exactly just sitting in a directory, as they are when using a tool like rsync. (This is a bit of a downside for restic, but it’s fine.) Instead, we have to use restic’s restore subcommand.

First, we copy the snapshot id of the snapshot we want to restore from from that snapshot command. Then we’ll make a new directory to restore to, and restore to it using restic’s restore subcommand:

mkdir ~/Documents_restored
restic -r /media/sschlinkert/external_harddrive/restic-repo restore 7ea938aa --target ~/Documents_restored

This’ll take a while, but when it’s done our data should be restored to the location we specified, ~/Documents_restored. At that point, we can do a sanity-check with:

ls ~/Documents_restored

Automatically using the “latest” snapshop

Your can also have restic use the “latest” snapshot, but I’m bit confused by how it decides which path to use if there are snapshots in the same repo of completely different data, like Music/ and Movies/. In this case, each snapshot would have a unique “path”, so maybe best practice when you need/want to use the latest keyword is to explicitly specify the path to the data with --path?

restic -r /media/sschlinkert/external_harddrive/restic-repo restore latest --target ~/Documents_restored --path "home/sschlinkert"

While I could see myself scripting a backup call somewhere, I don’t think I’ll be automating or scripting a restore call any time soon, so I think I’ll favor giving a specific snapshot id when the time comes to restore by data (praying to the Restic gods that it goes smoothly).

Day-to-day backing up with restic

Phew! We’re ready for day-to-day life with restic.

But restic -r /media/$USER/external_harddrive/restic-repo --verbose backup /home/$USER/ is a bit of mouthful to straight-up remember to type. I’m sure some folks set up a cron job to run their restic backup. I might write a bash function, either next to my restic repo or directly in my bashrc.

Excluding files

Since I’ve decided to back-up my entire home directory, there are quite a few files and directories I can safety exclude from back-ups. You can read the exclude options in the documentation, but I decided to use the --exclude-file flag, which excludes items listed in a given file. For now, that file is just ~/restic-excludes.txt and its contents are:

/home/$USER/.bundle
/home/$USER/.cache
/home/$USER/.cargo
/home/$USER/.gem
/home/$USER/.local/share/flatpak
/home/$USER/.local/share/Steam
/home/$USER/.local/share/Trash
/home/$USER/.local/share/baloo
/home/$USER/.mozilla
/home/$USER/.npm
/home/$USER/.nvm
/home/$USER/.pyenv
/home/$USER/.rbenv
/home/$USER/.rustup
/home/$USER/.var/app/org.chromium.Chromium/cache/
/home/$USER/.zoom
/home/$USER/snap

(Note that, according to the docs at this time of writing, Restic does NOT expand references to ~ as your home directory in this exclude-file, but we can access the $USER variable here.)

And then my Restic command to use this file would be something like:

restic -r /media/sschlinkert/external_harddrive/restic-repo/ --verbose backup --exclude-file=/home/sschlinkert/restic-excludes.txt /home/sschlinkert/

Removing snapshots

To remove snapshots, restic has commands like forget and prune, which are detailed in the docs. I’m not 100% how these work yet, so I won’t go into here!

Wrappers around Restic

There’s a tool someone mentioned called Rustic, a Restic wrapper for easy backups, but I haven’t looked into it.

What do you use!?

Let me know on Mastodon or Twitter.


Appendix: Other archiving tools I found

All-in-one command line tools for archiving files

Online storage options

File types for archives

  • Bit Bottle (very alpha so far, but an interesting idea?)