Of Backups and Archives
The time has come for us all to stop holding backup tapes for years at a time and pretending they're an archive.
May 21, 2009
We all have words or phrases that make our blood boil. For me, "That's the way we've always done it" is near the top of my list. Of course, it really means I don't have a good reason for why we do things that way. In the 25 years I've been an independent consultant (which really means change agent), I've lost count of the times I've been hired to help an organization clean up some process only to hear "that's the way we've always done it" as if historical precedent should be the primary driver of the planning process.
So let me say once and for all -- the time has come for us all to stop holding backup tapes for years at a time and pretending they're an archive. While old DLT7000, or even worse, DDS tapes at Iron Mountain may meet the legal definition of retention they don't make a useful archive.
The existential difference between backup repositories and archives isn't the media they use or the hardware they're built on but their purpose. As a writer I find this clear in the language we use to describe the process of getting data from each type of data store.
We make backups in order to restore things like servers, databases, file systems, mailboxes or even individual files or email messages to their previous condition should they be lost, damaged, deleted or corrupted. Restores, in general, return things to their original place and condition so they can be used for their original purpose.
Archives on the other hand exist so data can be retrieved. Once retrieved that data is usually used in a different way than when it was originally created. Emails can be restored to be answered or acted on, or they can be retrieved to settle an argument, legal or otherwise.
Therefore, backup repositories are organized by context like where the data was when it was backed up and when it was backed up. Actually accessing the data requires restoring it and usually reconnecting the applications that support it. Anyone who's ever tried to recover data from an Exchange 5.5 backup can attest to just how much effort it takes.
On top of that, most backup applications are designed to restore data that's been backed up recently, keeping just a few months to a year of index data, so just figuring out what tape has the June 14, 2006 backup of the executive home folders is a project.
Archive your files, email, etc with Mimosa Nearpoint, Enterprise Vault, MetaLogix PAM, Atempo's Digital Archive, EMC SourceOne or any of the seeming hundreds of other archiving applications on the market and it builds an index not just of your data's location and backup time but its content as well as its context, and that full text index lasts as long as you've told it you want to retain the data. Now you can search for documents from June 1-June 20 2006 including the keywords "ohnston, Smythe and harassment".
Of course all that indexing takes time so an archive solution won't Hoover up data as fast as a backup solution can. But remember that you only have to archive data once, where most people backup their data weekly. Good archive solutions do single instance storage and compress files then store them in multiple locations, which also takes some time but reduces storage space and reduces the need to backup the archive.
Next time we'll talk about storage for archive data. Hint: Spinning rust isn't the only option.
Read more about:
2009About the Author
You May Also Like