Long-Term Data Retention: It's More Than Media
The other day I managed to get one of those rare moments where I was sitting with a group of storage guys that worked for vendors but weren't trying to sell me any thing, not even their companies' ideas. The topic of the conversation turned to long-term archiving, and I realized that we--as an industry--have been spending too much time worrying about where we store our data and not enough about how.
April 4, 2011
The other day I managed to get one of those rare moments where I was sitting with a group of storage guys that worked for vendors but weren't trying to sell me any thing, not even their companies' ideas. The topic of the conversation turned to long-term archiving, and I realized that we--as an industry--have been spending too much time worrying about where we store our data and not enough about how.
One member of our little group maintained that a disk array with spin-down MAID (or, massive array of idle disks) support, like a Nexsan SATABeast, was the best place for long-term data. He argued that the low cost of acquisition and low power consumption made simple RAID the best place for archival data.
Another liked EMC's new Data Domain archiving solution. Since the solution uses data deduplication, he contended the savings in floor space and power consumption would make up for the higher acquisition price.
The third brought up a problem inherent in either of the previous solutions--namely, that every five years or so you'd need to migrate your data from the disk array or dedupe appliance to a new model when your vendor terminated support for the old one.
He championed a scale-out archive storage system with a MAID architecture, like Dell's DX6000 or the HDS HCAP. As long as your vendor is still selling the archive platform, you can add new nodes with 2TByte drives to your cluster and have the system migrate your data to the new nodes and off the old nodes that have 250GByte nodes.Just to be a contrarian, I took the pro-tape position. After all, tape is cheap, and tape on the shelf doesn't need power or even data center-grade air conditioning.
But as I made my arguments, I realized that the problem was that we were asking one back-end storage system to solve all our problems. What we really need is more intelligence in the archive management software. Our archiving, or content management, system should be able to store its index on a disk platform and store its data across multiple tiers of storage, based on the data's retrieval urgency.
The archive software should have simple processes for retiring a storage pool or media. A simple command should start the process of migrating the data from on old disk array to a new one, or from 400 LTO-3 to 100 LTO-5 tapes so you can retire your LTO-5 drives and upgrade to LTO-7s. A message when the migration is complete can then tell you it's OK to turn off your old storage.
The archive software can, as some do, also be responsible for data reduction and tracking the multiple copies of your archival data that long-term retention requires. It doesn't matter how reliable a storage system you use if the building it is in burns to the ground. Your data needs to be stored in multiple locations, and the archive system should keep track of where those copies are--even if it relies on the storage back end to get it from point A to point B.
Before you spend a lot of time and money on the storage part of your archive system, think about how you'll get the data there and whether some problems aren't better solved at that layer.
About the Author
You May Also Like