Dealing with backups

How many backups is too many backups?

People tell me all the time: "you have too many backups!". I usually laugh: those who lost tons (GB+) of data due to some disk issue know that there's not such thing as too many backups. People don't realize how valuable data is, and how hard and expensive is to (try to) recover it after it's lost.

It's hard to know what, when, how to backup, and even where to. I, for example, have an automatic backup system with rsync on my desktop machine sending the data to my local server. This backup is designed for preventing loss of data against faulty drives, not against overall damage due to power issues (lightning?). It runs every 2h and backups up everything I consider important (somehow that includes over 90GB[1] of songs), such as personal data, college projects (believe, they're useful every now and then), photos, virtual machines (oh yeah), and so on. These are things that I consider important for me to have a backup. I also have the same thing on my laptop, but I don't run it as often (every 3 months or so? ops!).

I recently had to delete some of my data because not only my backup drive was getting full, but my desktop's drives were also getting full. I still have tons of it, but I deleted over 100GB of data in 2 or 3 days. That includes old VMs (some I compacted and left somewhere else), duplicated data[2], useless backups, etc. Just data that I don't care anymore or simply won't use. But it's really hard to figure out what to delete: I store way too much information.

Anyway, regarding setting backups up, first of all, you need to figure out what do you need to have backup. Your thesis probably is a good idea. Your work data and home projects probably are also something you might wanna keep a second copy somewhere. Photos? Well, it depends on how much you value your memories. Music? Let's not comment on that. You need to keep backups of everything that, if you lose, you're going to be screwed in a bad situation, either with yourself or someone else. Restoring lost data, such as recreating a whole development VM, could be a pain in the ass. Sit down, look at all your files and figure it out.

Now, how often do you modify such files? That will define when you should backup your data. I, for example, used to have a daily backup for my thesis. Sometimes even more. My desktop has its backup script set for every 2h because I run VMs and I want to have the modifications on them secured from a power loss or a system crash, for example. If you don't change your files very often, you could do your backups manually. But if you are like me, who works with a bunch of files and can't keep track of what you modified, you probably want to do this automatically. And if you're coding, you probably want to do this as often as possible, since losing your code can be (and usually is) terrible.

Touching on the topic of backup automation (how), there are many tools that can help you with that. I still prefer cron (or Task Scheduler on Windows) and rsync. Again, if you only change a few files here and there, you can copy them manually. But if it's a big project, you might wanna copy the whole folder - or folders. Things might start to get really annoying when you miss a file or two and your backup ends up inconsistent. At that point it's usually a good idea to go looking for tools that will do everything you need in terms of copying your data. Or you can write your script (or program) for that. Nevertheless, this will give you a full backup ("full" as in "everything you need to copy"), which you can then automate by setting it to run under a few events. You could, for example, set it to run every X hours like I do, or when your system is about to shutdown, or whenever your system is untouched for over 15 minutes. Find what suits you best and set it up.

Finally, I often raise myself the question of where should I store my backup. Most people say that the cloud is a good practice. Actually, that is not. First of all, you probably want to encrypt your backup if you're sending it to something like Dropbox or Amazon S3, just to play it safe. Second, you need a good connection and a lot of time to upload your first backup, which, in my case, is almost 800GB). The next backups can be differential ones (ie. sending only the difference, as rsync and Dropbox usually do). However, if you have VMs like me, the differential backup won't help you that much. Third, you probably want your backup to be done as soon as possible: you just can't wait one hour for it to finish. Also, what if you don't have an Internet connection at the moment, how can you run a backup on the cloud that way? And restoring it might take a while. Meh, I don't like it, even though I do have some data on Dropbox (not nearly as much as my actual home backup).

Work and universities' servers can be a good choice depending on where you are. If you have access to a server in the facilities where you spend a lot of time, use that as a backup location. There you probably have a 100 or even a 1000mbps link to the server, so copying your data will be fast. In case of a disaster with your machine, the data is secure on their servers. In case of a disaster with their servers, data will be secure on your computer. In case of a disaster on both (like a fire or a nuke[3]), well, then you're screwed. If you can, use remote locations for backups, such as other buildings[4]: as long as you have a good link to them, your network backup will be fast as it should be.

You can always do like me and have a local server at home for backups. It's not designed to prevent huge damage in the whole place, such as fire, but to deal with disk failures, which happens quite more often than you think. At home I have a good link to my server, so backups are fast. But then you also have to take care of your server, which can take time. Since I use it for other stuff as well, that's fine by me and it's a good enough solution for now.

Finally, there'a always the external HDD or flash drive. You can always backup to those. I know a lot of people who does that and it works. It protects you from all sorts of issues, as long as you keep them disconnected[^n] and, if possible, in a remote location. I just consider it... slow. Sure, you can automate it: once detected by the OS, backup script starts running and, after finished, disconnects the disk. If that works for you, great, go for it!

It doesn't really matter how your backup system works. All that matter is that you need to backup of your data - always. Don't trust drives: one day they will stop working. It might take years, but they will stop. Also, Murphy hates you: whenever you need a backup, you don't have one. Then, my friend, you're more than screwed. Good luck!


  1. It's true :-( ↩︎

  2. Copying files from the old computer to the new one using an external drive that already had all the data. Man, that can be confusing. ↩︎

  3. I'm creative sometimes! ↩︎

  4. Warning: a nuke will probably destroy the other buildings as well! ↩︎