Can We Get to a Single Point of Deduplication?

With the EMC-Data Domain acquisition seemingly moving along, a new point of debate is going to arise; can we get to a single point of deduplication? Meaning can you have all your data tiers; primary, archive and backup deduplicated by...

George Crump

July 29, 2009

4 Min Read
Network Computing logo

With the EMC-Data Domain acquisition seemingly moving along, a new point of debate is going to arise; can we get to a single point of deduplication? Meaning can you have all your data tiers; primary, archive and backup deduplicated by a single engine?  


The argument makes sense, especially in deduplication. After all the more data you put through the deduplication process, theoretically, the greater your deduplication rate is going to be. The fewer interfaces and processes you have to deal with for optimization the easier your management of the process is going to be. The challenge is that right now for the most part we have seen a more silo approach to deduplication.


Primary storage deduplication has been handled by NetApp or EMC on their respective NAS gateways. The best use case for deduplication on primary storage is virtualized server images and to a lesser extent, user home directories.


Archive deduplication has been handled by companies like Permabit and Nexsan as well as to some extent NetApp on NearStore. There are also companies like Ocarina Networks that cross the bridge between primary and archive by both deduplicating and moving data from primary to secondary storage. Data Domain over a year ago opened their appliances up to be used as a secondary storage point.


Backup of course was and is dominated by Data Domain but also companies like Quantum, Sepaton and Exagrid all have solutions in the space. 


The current deduplication vendors could work on building out their solutions to either scale up into primary storage performance (see Data Domain's DD880) or they could move their existing data duplication technology into other markets; see the increased speed of Ocarina Networks and Permabit as well as their move into cloud storage. 


We also have the approach of CommVault, Atempo, Acronis and EMC's Avamar; backup software with built-in data deduplication. As the deduplication capability is extended to the other modules that they offer, like archive, they can begin to claim a single point of deduplication. The challenge is of course that all the data to be deduplicated must go through that engine. Clearly the backup software solutions have or can integrate archive but I don't know how they can move into primary storage optimization or if they would want to.


With their purchase of Data Domain, EMC has more disparate deduplication solutions than anyone. If EMC takes full advantage of the Data Domain technology and integrates it into everything they do, they could get to a single point of deduplication. Even Avamar could play. Use the Avamar technology to solve the problem that Data Domain had; optimizing before it crosses the network, yet leveraging their technology on the back end. 


NetApp and Ocarina could continue to enhance and improve the re-hydration speed of their technologies to make read performance a non-issue, making primary storage a viable platform. Ocarina can already maintain the deduplicated format as they move through tiers, so landing on backup or archive disk would simply be another move for them. 


The ability to get to a single point of deduplication clearly exists and there are several companies that are well on their way to getting there and several more that can get there. The next questions are do you need a single point of deduplication and can these systems scale both from a capacity standpoint and a metadata management standpoint to meet the increasing demands that will be placed on these systems?

Read more about:

2009

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights