VTL Data De-Duplication Eases Storage Crunch

Arizona Republic newspaper adopts a Data Domain data de-duplication system to reduce storage needs and backup costs

March 13, 2009

4 Min Read
NetworkComputing logo in a gray background | NetworkComputing

The Arizona Republic newspaper functions as much like a storage vendor as a storage customer. It backs up its own data as well as that of other entities owned by its parent company, Gannett, using a variety of systems and backup schedules. Consequently, the publishing entity is often at the forefront of developments in backup technology, such as implementing data de-duplication technology to meet its growing storage requirements.

Founded in May 1890, The Arizona Republic has been the state's largest and most influential newspaper. The company was also an early entrant in Internet media, launching a Website in 1995. In 2000, Gannett, which has close to 1,000 domestic news publications including USA Today under its aegis, gobbled up the paper as well as its associated properties.

The Arizona media firm now has roughly 450 to 500 servers, running a variety of operating systems such as Linux, Solaris, IBM AIX, X86, and VMWare. "In the media industry, companies develop a lot of custom applications because there are not a lot of off-the-shelf applications available," says John Tabor, principal system administrator at The Arizona Republic.

The company has more than 100 TB of information under management. Consequently, the company uses various backup approaches and systems, such as CommVault's Simpana, EMC's Legato, and Symantec's NetBackup. The media company has staggered the backup times on its servers, with much of the work being done on the weekends. Some data is backed up daily, some weekly.

Meeting backup windows has been an ongoing challenge, and one reason why the publishing entity was an early entrant in using a virtual tape library (VTL) area. In 2003, The Arizona Republic decided against using tape to back up its growing SAN because that approach would have been expensive and time consuming. The media company selected a VTL from EMC to support the storage network.The volume of data being backed up by its VTL has grown as the publisher added other systems, such as network-attached storage. The NAS, which is backed up daily, grew from 4 TB to 12 TB. "We were reaching the point where we needed to add another system," Tabor says.

The EMC product worked fine, but like many first-generation VTLs it did not support data de-duplication. So in 2007, the publishing company decided to examine ways to add that feature. The Arizona Republic narrowed its search to products from Data Domain and EMC. The former offered the simpler upgrade path. "The EMC system was a 'rip and replace' upgrade, while we could just drop the Data Domain system in and not have to buy new disks," Tabor says.

The Data Domain system's features enabled the publishing company to justify the cost the acquisition. The new product costs about $120,000, about the same as buying additional disk capacity. But additional capacity was no longer needed because the VTL could more efficiently use existing storage. In addition, Symantec's Netbackup software, which supports the VTL, charges for raw capacity rather than virtual capacity, so data de-duplication let the newspaper increase its VTL storage usage without increasing its licensing costs.

Last summer, the company brought the Data Domain system in and tested it. The tests went well, so the system was quickly brought online. "We were creating VTLs on the same day that the system was installed," Tabor says. The new system delivered significant improvements, such as providing users with almost instant restoration services of a minute or less.

However, there were a few limitations. At first, the system did not support multi-pathing, which is used on the company's redundant fiber optic data center network. That problem was quickly fixed.There were some initial concerns about performance. Eliminating redundant data through de-duplication reduces the amount of data that needs to be backed up, cutting down the amount of time and storage each backup requires. Data Domain had been talking about 20X improvements -- shrinking 100 TB of data to 10 TB -- but the media company was realizing only 7X to 10X increases. After sifting through performance reports, the Data Domain support staff determined that a one-time backup had skewed the results. "We had an emergency and had used the VTL to backup a photo archive," Tabor says. The first time a company uses data de-duplication features, the system sets up the necessary indexes; performance improvements only come with subsequent backups, which in this case never occurred.

With the data de-duplication features now firmly entrenched, the media company is planning to expand its VTL. The company plans to change its VMWare backups, moving them directly to disk and bypassing the Symantec Netbackup system. "Since adding the data de-duplication features, we have been managing our storage much more efficiently," Tabor says.

InformationWeek Analytics has published an independent report on disaster recovery planning. Download the report here (registration required).

Have a comment on this story? Please click "Discuss" below. If you'd like to contact Byte and Switch's editors directly, send us a message

Read more about:

2009
SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights