Deduplication's Five Modes

I want to back up a bit in our deduplication discussion. I have had trouble bracketing the deduplication field thus far, and maybe there is a different approach. Let's discuss the modes of deduplication. I think there are five: deduplication, replication, maintenance, rest and restore. There is a sixth mode, move to tape, which is still relevant for most data centers. I am going to pick these modes apart one at a time and I may spend several entries on a single mode. If I don't cover every aspec

George Crump

November 2, 2009

3 Min Read
Network Computing logo

I want to back up a bit in our deduplication discussion. I have hadtrouble bracketing the deduplication field thus far, and maybe there isa different approach. Let's discuss the modes of deduplication. I thinkthere are five: deduplication, replication, maintenance, rest andrestore. There is a sixth mode, move to tape, which is still relevantfor most data centers. I am going to pick these modes apart one at atime and I may spend several entries on a single mode. If I don't coverevery aspect of each mode in a single entry, I ask your patience. Ifyou think I missed a mode let me know.

The duplication mode is where the community, myself included, has spentmuch of our time arguing about when and where deduplication should bedone. But if you are a user of this technology, while this mode isimportant, what should matter most is if the deduplication process canbe done in an appropriate amount of time for your requirements, and ifthe end result of this mode delivers a high enough level ofoptimization to make deduplication a worth wild investment.

Deduplication can typically happen either before the data is sent tothe backup device or it can be sent when the data gets to the device.The advantage of deduplicating data before it gets to the device,commonly called source side deduplication, is that it reduces the demand onthe backup network and should make the actual storing of the datarelatively quick. The downside is that this requires a replacement ofthe backup application or a new agent from your current backup softwaresupplier. Another potential downside is that there may be a performanceimpact to the server being backed up. The performance issue seems tohave been reduced in recent years as the software suppliers haveimproved the agents. It also helps that there is now additionalprocessing power in the CPUs of the servers being backed up. In short,they can do more tasks.

The other option is to do the deduplication at the target. This can bedone via the backup application itself or by the deduplication system.In these scenarios, the entire backup data set is sent across thenetwork, no different than most other backup processes. The advantage is that there is little to no change in the backup agent or thebackup process. The system's approach, which is typically a disk basedappliance with deduplication capabilities, is merely a target to thecurrent backup application. The backup software option again requires areplacement of your current backup application.

Which one is best? Really it depends. All claim very good performance.Source side deduplication may have some benefits in network bandwidthconstrained environments, but like backup applications that are nowadding deduplication, they require a change of backup software. In bothcases, the move to one of these products should be considered asseriously as switching to a new backup application. Deduplicationsystems on the other hand provide storage efficiencies with limitedchanges to the environment but they do require the same, continuedinvestment in the backup infrastructure as the underlying data setgrows.Each one of the methods in the deduplication mode could earn an entryor two by themselves and I may return to it to do just that. For nowhowever my next entry will focus on the next mode: replication.

Read more about:

2009

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights