Do We Need Primary Storage Deduplication?
With the recent buzz around a few new primary storage deduplication products, I've seen the question of primary storage deduplication's value come up more than once. After all, if you are managing your storage correctly there shouldn't be much duplicate data, especially on primary storage right? Sure, and we all archive all of our old data to tape as soon as it has not been accessed for 90 days. Even in a well-managed system there is redundant data on primary storage, so deduplication's benefits
June 9, 2010
With the recent buzz around a few new primary storage deduplication products, I've seen the question of primary storage deduplication's value come up more than once. After all, if you are managing your storage correctly there shouldn't be much duplicate data, especially on primary storage right? Sure, and we all archive all of our old data to tape as soon as it has not been accessed for 90 days. Even in a well-managed system there is redundant data on primary storage, so deduplication's benefits can be enormous.
First, as I hinted to earlier, storage is growing too fast and IT staffs are too overworked to manage it all. Extra copies of data are going to sneak in. That DBA is going to keep several copies of dumps, users are going to save version 1, 2 and 3 of a file under different names and never go back and clean out the older copies. You get the picture. Then there are the more legitimate cases like the company logo that is inserted into every slide of every presentation and memo that is stored on your servers. Primary storage deduplication will catch all of these instances for you when you don't have time to.
The second area where primary storage deduplication will have a roll to play is in the storage of virtualized server and desktop images. The redundancy between these image files is very high. Primary storage deduplication will eliminate this redundancy as well, potentially saving terabytes of capacity. In many cases, the read back from deduplicated data offers little or no performance impact.
The third and potentially the biggest payoff is that deduplicating primary storage will effect optimization--copies of data, backups, snapshots and even replication jobs should all require less capacity. This does not remove the need for a secondary backup; every so often it seems like it will be a good idea to have a standalone copy of data, not tied back to any deduplication or snapshot meta data. Being able to deduplicate data earlier in the process does potentially reduce the frequency that a separate device is used, especially if the primary storage system replicates to a similarly enabled system in a DR location.
This effect makes backups merely copies of the same data. The backup application could back up to the same storage system. No need for a second one. Archives become copies of files with maybe a Write Once Read Many (WORM) flag thrown on them, but the archive application would copy that data to the same storage system.A lot has to occur for this level of data optimization to become a reality. First, the primary storage vendors need to offer a deduplication engine in their storage solution. Second the deduplication process and its handling of the meta data will also need to prove its reliability. Only time will provide that assurance. Until then, and for quite a while, I expect users to deploy deduplication slowly with much testing. I do believe that dedupe on primary storage will eventually prove itself to be reliable. At that point, it will be interesting to see if deduplication downstream will be rendered obsolete. My guess is that it will not. While some users may place all their faith in primary deduplication, there is a certain comfort and maybe even indisputable common sense to keep a separate copy of data on a totally different storage device.
Read more about:
2010About the Author
You May Also Like