Hugo Patterson, Chief Architect, Data Domain

"At the end of the day, de-duplication is really a technology that simplifies many aspects of data management."

July 14, 2007

7 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Hugo Patterson has a tough job. As chief architect for newly public Data Domain Inc. (Nasdaq: DDUP), he's partly responsible for keeping his company's momentum going.

It will not be easy. Data Domain's brand of data de-duplication helped put it on the storage map in a big enough way to fuel a $109 million IPO earlier this month. (See Data Domain Goes Public and Data Domain Closes IPO.) But the problem with going public is staying attractive to shareholders. (See Data Domain Dives In.) And that means choosing the right path to customer demand and choosing it again and again.

It's up to Patterson, a onetime NetApp lead architect who's into his sixth year at Data Domain, to help his company stay hot. That will prove increasingly challenging, as more vendors enter the data reduction and protection markets. (See Experts Share De-Dupe Insights, Quantum to Offer De-Dupe Duo, NetApp De-Dupes, Hifn CTO Outlines Strategy, and Analysis: Data De-Duping.)

We caught up with Patterson just after the company's Wall Street debut. We were cautioned about his SEC-mandated silence on things financial, but that didn't stop us from probing for other information.

Contents:

— Mary Jander, Site Editor, Byte and Switch

Next Page: The Basics

Byte and Switch: Tell us where you're at. How do you define data de-duplication, and how does it fit the bigger picture of what Data Domain is doing?

Patterson: Data Domain produces a storage product, and its key differentiators are de-duplication at high speeds and extensive data protection.

The de-duplication is an enabler for very efficient replication, which is a very important part of data protection as well. For disaster recovery, you need to have a copy of data at another location.At the end of the day, de-duplication is really a technology that simplifies many aspects of data management. It starts by reducing the size and cost of the storage. De-duplication is a cost-reduction enabler for data protection with disk, as opposed to tape. And anytime you can reduce or eliminate the amount of tape you're using, you've simplified your data management problems enormously.

Because of de-duplication, we can replicate data offsite efficiently, and customers are not burdened with the cost of managing tape.

Byte and Switch: Describe data protection.

Patterson: There are many aspects to data protection, and de-duplication plays an important role in all of them. It makes it more cost-effective to have multiple snapshots so you can keep more versions online.

Data protection requires you to have three functions in one mechanism. You want to make a copy of the latest version of a file every day, but also you need another copy, a separate copy of data on a separate piece of physical storage, so if something happens you have a backup. If you copy the data to a location that is physically removed from the first copy, or remote, then if there's an event like a fire or flood and the entire system that stored the original copy is lost, you still have another copy someplace else.Another aspect of data protection is the safety of the storage system itself. If you are reducing the number of copies of data, it's extremely important that the data you maintain is well protected against loss.... Disks are not perfect devices. Occasionally a sector goes bad and data is unreadable. Storage systems are created by humans and have bugs. The processors and computing platforms that storage software runs on can have glitches, or memory bits can be corrupted. It's important to protect against all of those.

Byte and Switch: Speak to us about replication.

Patterson: Certainly, it's taking off. How error prone it can be to remove tapes and take them offsite makes it extremely attractive to replicate data offsite if at all possible. We've seen this historically for high-end data. If you look at transaction processing, at the systems at the core of many high-level enterprises, the ones that earn bread and butter day in and day out, and for which if there were downtime it would be costly – for a long time those have been replicated to other sites. The value of the data makes the cost of real-time replication worth it.

De-duplication makes it affordable to replicate all data, not just the high tier that contains the most valuable data.

Anyone would prefer to replicate data offsite if they could afford to do that. De-duplication is the technology that makes that possible. This is particularly true of enterprises with many remote sites – dozens to hundreds – all needing some kind of data protection. In many of these sites, they have a very thin IT staff and maybe no permanent resident IT people. So in a case like that, who will be responsible for removing that tape and getting it offsite?In these cases, enterprises are very interested in a completely automated solution for data protection.

Next Page: The Inline Question

Byte and Switch: We hear a lot of arguments about where data de-duplication occurs. Data Domain does it inline, for example. What's the benefit of that?[ED. NOTE: There are currently a couple of approaches to de-duplication: inline processing, which is offered by the likes of Data Domain and Diligent, among others; and post-processing, which is offered by Sepaton and others. Inline processing takes place as data is being received from the backup servers and before it is stored to disk, skipping a final step. Post-processing, as its name suggests, occurs after the backup, thus avoiding any interference with it.]

Patterson: It really comes down to a question of cost and performance. If you don't duplicate in line, you're going to write the data to disk someplace else in uncompressed, unde-duplicated format, and that requires storage.

The whole idea of de-duplication is to reduce the disk footprint – why would you store all that data in non-de-duplicated format? Clearly, if you de-dupe inline you reduce the footprint and cost.Further, when it comes to protecting data by getting it offsite, most environments can't afford to replicate data offsite until it's de-duplicated. If you don't de-duplicate immediately, you can't replicate immediately. It delays the time to protection. It also adds to the system.

If you look at the storage footprint and the time to protect the data, both argue strongly for doing it inline if you can. If you could do it fast enough, why wouldn't you?

Byte and Switch: Isn't there ever an argument for post-processing de-duplication?

Patterson: Not doing it inline is basically a crutch for those who can't do it fast enough.

Next Page: Tapering Off TapeByte and Switch: What's the biggest problem with tape?

Patterson: Tape has a number of weaknesses, but the key problem in terms of data protection is that it's fundamentally manual. You have physical entities you need to manage. You need to remove a tape to get it offsite, put it in a box, and send it someplace else. Tapes can be dropped, broken, misplaced. The person driving the truck can make a wrong turn. There are many ways the manual process can fail.

Byte and Switch: Why are people still buying tape?

Patterson: A couple of reasons. Disk-to-disk is still fairly new, and IT staffs have policies. Changing to anything new isn't something you do rashly in a data center. But de-duplication products have been selling for a number of years. People are extremely receptive to this and they are making the move away from tape for many use cases.

Now, many of them do choose to continue to use tape for some aspects – for another layer of protection, for example. Many folks get started with disk backup by doing backups to a Data Domain restore on a daily basis and once a week making a tape to take offsite. Or, as they get used to disk, they find that once a month a tape for archival purposes is sufficient. But they no longer rely on tape for daily recovery processes.If somebody needs an earlier version of a file back, or needs to do a full restore of a system, they are no longer relying on tape for that. Tape becomes an insurance policy only.

You don't have to stop doing tape cold turkey. Tape is never going to go away, though "never" is a long time. If you look at technologies that have established themselves in the way tape has, you see they don't go away – but use passes slowly.

Next Page: What's Ahead

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights