Does Primary Storage Deduplication Kill Archive And Backup?

As we begin to test primary storage deduplication technology, our initial findings are that the latency it introduces may be a non-issue for many data centers and applications. It may soon be a non-issue for all data centers and applications. If you can get deduplication on primary storage for "free" from a performance perspective, what is the impact of primary storage on the other tiers of storage? Does primary storage deduplication kill archive and backup?

George Crump

July 2, 2010

3 Min Read
Network Computing logo

As we begin to test primary storage deduplication technology, our initial findings are that the latency it introduces may be a non-issue for many data centers and applications. It may soon be a non-issue for all data centers and applications. If you can get deduplication on primary storage for "free" from a performance perspective, what is the impact of primary storage on the other tiers of storage? Does primary storage deduplication kill archive and backup?

Think of this scenario. In the not too distant future, you buy a storage solution from a vendor, it has a few shelves of solid state storage, a few shelves of 15k SAS drives and many shelves of SATA storage. All of the storage is under the control of a single storage management software running either on the storage controller or across storage nodes.

Data is either deduplicated inline or post process and may or may not be compressed. The result is that a 100TB storage system may now store 10PBs of actual data but only a small fraction of that data resides on either the SSD or SAS tier. This is because the storage system automatically moves data up and down the storage types based on age or other user defined parameter with auto-tiering. Performance is high and costs are under control.

The impact of this type of storage system on archive and backup systems could be significant. I think that archive specifically could be a thing of the past. Not the elimination of archive as a process but the elimination of archive as a stand alone storage system. If I can store 10PBs of information in one system why wouldn't I? No matter how cheap I make the disk archive I still need primary storage. If the archive can reside in the SATA storage area, unless the primary storage vendor is charging a ridiculous premium for that SATA storage, it is something to consider.

Backups are equally at risk. Most primary storage suppliers claim either unlimited or high numbers of snapshots, so roll back in time can be covered. Most if not all primary storage suppliers can replicate data to a secondary site, so failure of the primary system or even the site does not mean data loss. The only concern is if something goes wrong in the handling of meta data. If a corruption is introduced or when you hit 5PBs of snapshots, the system just fails. We've seen no indication of that, but it could happen. At some point you are going to want your data on some other platform just in case this type of scenario occurs.I don't think primary storage suppliers are quite there yet. For primary storage to become archive and at least short term backup, suppliers will need to offer retention capabilities, improved scaling and eventually break their dependency on RAID for a protection scheme (at least on SATA storage). The result is for the near term archive suppliers still have an important role to play and backup system suppliers for a potentially longer term. The point is that if a primary storage supplier wanted to design a single system that could do it all, the potential now exists.

What will be interesting is if any of these suppliers will be able to integrate deduplication throughout the stack and all the way through the backup and replication copies of the data set. I think though for many users, myself included, there is always a good feeling about having another set of data on a different storage medium that has no dependency on the original storage system.

Another impact of this type of storage system is what is the impact on traditional dual controller architectures. In our next entry we will look at the potential impact of all of these storage services on those controllers.

Read more about:

2010

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights