NavigationtagsRecent comments
User login |
Rethinking the backup paradigm: a higher-level approachIn this post I explain my vision on the concepts of backups and how several common practices are in my opinion suboptimal and become unnecessary or at least can be done more easily by managing data on a higher level by employing other patterns such as versioning important directories and distributed data management. The "classic" approachesThere are many backup scripts/programs out there, each with their own - sometimes subtly and sometimes radically - different approach.
Most tools I've seen operate on layers 2, 4 (with optionally a little lvm help) and 5.
Everyone knows dd. It is the de-facto tool for block device backups. (usually combined with scp, gzip et all). This solution is good for when your entire hard disk has died: after restoring you have everything back: from the mbr (boot loader & partition layout), dm-crypt/lvm setups, specific filesystem tweaks - although most users don't care about this - (reserved space settings, block size etc), to your entire system (installed packages, config files, home directories, ...) and all your personal data and settings. But even with differential backups there are a few issues:
In this category fall the scripts/tools that do scp's, rsyncs or similar algorithms of the contents of an entire filesystem. The cool part of these sort of tools (and the one in the upcoming paragraph) compared to the previous type is that is more easy to expose the contents of your filesystem from 'inside the backup' (because it is usually just a directory with a copy of all your files), so it becomes more easy to restore specific files and directories if you would want that, although it is still not suited for roll-backs of specific files/directories to specific versions.
In this case ploughing through the pile of dotfiles and directories can be cumbersome. (As far as upgrading is concerned it's not like Linux distributions help the user in this when following the 'upgrade wizard', so this manual ploughing and copying must be done anyway at some point.) More bad news is that from this layer and upwards you do not backup your MBR and filesystem settings (and whatever is in between: lvm, dm-crypt, ...). So restoring a full system is a bit harder. You could still do it of course by backing up the mbr separately and/or use scripts to parse fdisk/tune2fs but it gets complicated to reconstruct a dm-crypt/lvm setup.
These scripts are the ones that basically iterate over your entire VFS, check some ignore options and copy all the rest. It's a very subtle difference with the previous category. Especially because they are usually configured to only backup the filesystems stored on harddisks and ignore the 'virtual' stuff, so in practice the result is the same. The mkstage4.sh script is a well-known example in the Gentoo/Arch community. Going one step further...Now... The way I see it, we should not focus on the goal of "making a backup of the entire system" because the result will be a mix of lots of different types of data all mangled together, not efficiently stored and not easily restorable in ways that make most sense for the specific types of data. Also, all the "classic" approaches need to make compromises. (see advantages/disadvantages of aforementioned techniques). I'm thinking especially of the many different restore scenarios that a good backup strategy must be able to handle. That's why we should work on the 6th layer: the user/OS based concepts and patterns.
I use 2 tools to help me with this (If you don't know what ddm is , have a look at the page so you get the basic concepts, especially the types of datasets) Generally speaking, I have the following repositories on my home-server:
To get the most out of these tools, a tightly organized, well-controlled home directory (system) is a must. Suffice it to say for now that half of the stuff that matters to me is in svn and can/should be reviewed & committed daily. As far as data is concerned, I can do everything I need with svn and ddm. The former for configs and most of my textfiles, the latter for media and other bigger binary files. (I store all important ~/.* in svn except .mozilla stuff: these directories are big and messy and go into ddm for now) But of course, data is not the only thing of importance on your system. Some things are not just files or directories on the system but other properties that must be backed up. We must be able to reconstruct them, so we need to be able to 'capture' and save their essence. I'm thinking of the bootloader, the partition table layout, the lvm/dm-crypt setup and filesystem settings (which filesystem on which partition, which block size, reserved space, ...) Note that this is one the issues that also the filesystem/VFS backup methods suffer from. For now I don't back this up myself, but I should. I think the best way to do it is to write scripts that parse output of tools such as fdisk, lvdisplay, tune2fs, ... and store the results in svn. Restoring this? .. read on! Okay, so now we have some ideas how to organize our data to maximize their usefulness and we can perfectly restore it the way we want thanks to ddm and svn. But there are 2 issues remaining:
To solve this problem, we need several things:
This is all stuff that I haven't implemented myself yet and I'm still thinking about. For the first item we could work a text files and a few scripts, or not entirely re-invent the wheel and (ab)use svn for this. Let / be the working copy, svn:ignore most of the system and use svn:externals to reconstruct your working copy needs (basically /home and /etc) , and put the ddm dataset directories in place (without the files that ddm manages of course), then it's just a matter of 'ddm up' in the ddm datasets and we're done. If anyone has ideas for this let me know :) For the other stuff we could use scripts that do a complete automatic installation with some scripts. So the end result of my backup strategy: my laptop, home workstation and home fileserver all use ddm and svn to manage everything of importance. I just need to do so some svn commits and ddm buffer flushes. To be really sure I then just have to make backups of the ddm and svn repositories on the fileserver and I'm done. Any questions or remarks? Let me know! Trackback URL for this post:http://dieter.plaetinck.be/trackback/44
Submitted by Dieter_be on Mon, 07/21/2008 - 20:21. categories [ ]
|
Post new comment