The Power of File System Snapshots (Part 1)
I'm more than certain anyone reading this has at least on one occasion lost a file, overwritten a file with bad data, or had a corrupt file. It is always a painful experience knowing that a piece of information you once had is now gone (the infamous school report lost on the computer scenario). However there is good news. This kind of problem will soon be a thing of the past. Why you ask? Well we live in the future, where we can fit all the songs in the world in our pocket, have access to all the knowledge in the world at a moment's notice, and fly rich people into space.
The concept of snapshots is not something new; it's been around for a while. However only within recent years has it manifested in a way that is more useful to a wider spectrum of users and Operating Systems. In the past snapshots were just glorified copies of data, taking up expensive amounts of disk space. They were often equivalent to the delta in file changes, smaller if compression was used. It also would cause performance hits to the system as increased disk I/O always does.
The forms of snapshots that make the most sense are at the block level using pointers. This is much more intelligent than full copies of the data set (files). You do not need to be a file system expert to understand the benefits here. A file is a just a set of blocks (of fixed or variable size). These blocks are what contain the binary bits that describe the data in the file, everything from the file header to the bad grammar in your blog posts, and your pirated media. Everything that is stored on your computers permanent storage systems are boiled down to blocks. (Queue the Lego analogy)
If you have ever programmed and used pointers this will be a familiar concept. I will use the cliché blocks and arrows method to explain the concepts of pointer based snapshots
Ok so let's say we wrote a file creatively called ‘file 1' onto an empty file system. The file is unrealistically only made of 3 blocks of fixed size for the sake of simplicity. Ok then let's take a snapshot.
As you can see we have taken a snapshot because the blocks are now blue and I have labeled it ‘snapshot 1'. So what does this mean? Well other then the fact that I changed the block color it means that that there is now a file system object that points to blocks A, B, and C. Those blocks can no longer be deleted or marked as free space. The reason for this is if I modified or deleted one of these 3 blocks my snapshot would no longer work. One very important note is that currently ‘Snapshot 1' does not take up any disk space; this is because the snapshot of the file system and the current state of the file system are equal. Just to clarify if I have a file system with 150GB of data and I take a snapshot, that snapshot references 150GB of data but does not take up any additional disk space...yet.
Ok now the next step is to do something obvious, like modify that file.
Alright now ‘File 1' has been modified. The modifications removed block B and added block D. Now a few things have just happened regarding "snapshot 1". It is now taking up disk space. More specifically it is taking up the disk space equivalent to the size of block B. ‘Snapshot 1' still references blocks A, B, and C of ‘File 1' before we made our modification. However the active file system now looks like this.
Currently we have a file system with one file and one snapshot. The disk space that is used up is that of blocks A, B, C, and D. Even though block B is not part of the active file, it is referenced by ‘snapshot 1' and therefore cannot be marked as free space. We can now rollback ‘File 1' to how it looked in ‘snapshot 1'. Block B will then be restored as active and block D will no longer be referenced and will be marked as free disk space (Unless you have created a snapshot after we modified ‘file 1' in which case that snapshot would now be taking up equivalent disk space as block D).
This might sound like snapshots will take up a lot of disk space, but that is not necessarily true. You just need to look at your storage usage. How much new data is being generated? How much of it is being modified and at what rate are these things happening? Answering those questions will help you set retention policies on how many snapshots you should keep. Keep in mind that snapshots happen at the file system level and not on per folder or per file basis.
Real world examples are fun for the whole family.
Recently I upgraded my laptop to Windows 7 and joined it to a Windows 2003 Domain. Well I inadvertently logged in when the computer was in the wrong OU and therefore the wrong folder redirection policy was applied. Somehow the actions of my own doing and Window's sync tool caused all of my documents and application data stored on our file server to be wiped out. I didn't notice this till the next day. Normally it would be an ‘OMG WTF!!??' generating event. Thankfully I built our file server with Solaris using ZFS. Every user has their own file system where their data is redirected via AD group policy. Using a simple CRON job I have all user data snapshoted every hour (retained for 24 hours) and then at midnight every day (retained for 30 days). Moral of the story is that two commands and 15 seconds later it was as if it never happened.