United Doubles Up Data Centers

Airline uses Veritas and HDS replication software to ensure non-stop service

September 3, 2003

4 Min Read
NetworkComputing logo in a gray background | NetworkComputing

As revelations of a strained and shaky power grid continue to trickle out following last months massive power outage that hit the Northeastern U.S. and parts of Canada, many businesses worry that the next blackout could cascade their way (see DR Fans: Few Black Eyes From Blackout and Net Up, Phones Busy in Blackout).

But Boris Sherman, the director of operations at United Airlines Loyalty Services (ULS), says he’s not losing any sleep at night.

The August 14 blackout didn’t make it to Chicago, where ULS, United Airlines Inc.’s e-commerce subsidiary, is located, but Sherman insists that his company wouldn’t have lost a dime if it had. “If the blackout had come this far,” he says, “I’m very confident that both our data centers would have remained up.”

What makes Sherman so confident? Realizing that shutting down the united.com Website, which has more than 30 million unique visitors a day, for even few minutes could deprive the company of millions in revenue, the ULS subsidiary implemented a Veritas Software Corp. (Nasdaq: VRTS) disaster recovery setup nearly two years ago, he says (see DR Fans: Few Black Eyes From Blackout).

[Ed. note: Sharp-eyed readers will recall that ULS last year installed CreekPath Systems Inc.'s SAN management software -- see United Puts SAN on Autopilot. These guys love their vendors, eh?]While Sherman won’t say how much the company has pumped into its so-called “business resumption” solution, he claims that it is well worth the price. “It makes very, very clear business sense,” he says. “The return on investment was well under a year.”

Unlike many other high-end disaster recovery solutions that mirror data to different states -- or even different countries -- ULS feels comfortable synchronously mirroring its data to a secondary site 10 miles away, Sherman says. Each data center runs 20 Tbytes of total disk space and 200 servers.

While the two sites are not very far apart, he explains, they rely on three different power grids. The primary site feeds into two separate grids, while the secondary site is on yet another grid. In addition, the primary site has two diesel generators and a number of different uninterruptible power supply (UPS) systems, while the secondary site has a natural gas backup system and a separate UPS system.

“If you’re running a primary business out of a data center of any sort, and you’re dependent on power, you need to make sure you always have power,” he says.

Using Veritas Volume Replicator host-based software (running on about eight different hosts), as well as a Hitachi Data Systems (HDS) subsystem-based replication software -- “just for our peace of mind” -- all information is synchronously replicated between the two data centers at all times. In addition, everything in both data centers is either clustered for failover, or hot clustered, using Veritas Cluster Manager.Going with Veritas was the natural choice, Sherman says. “We’ve been using various flavors of Veritas products for four to five years now. It was natural progression to use Veritas Volume Replicator… It was an easy decision at the time.”

But why such an elaborate setup? Being able to ward off disasters as large as last month’s blackout is, of course, a major issue, but the fully redundant system is also used for capacity testing, Sherman notes. In normal conditions, the secondary site is used as a testing and staging environment, where ULS can fully test capacity before launching new applications. “We wanted to build an environment that would allow us to do full-blown capacity testing."

While the primary site is working properly, the secondary site functions purely as a staging and testing site. But as soon as it registers that traffic from the primary site is being rerouted, the staging and testing instances automatically shut down, and the data center transforms into a disaster recovery site.

“There is no single point of failure. It is a fully automated, fully redundant system,” Sherman says. “Our system provides unattended, fully automated failover... If a data center were to go down at 3 o’clock in the morning, you don’t have to wait for someone to react.”

— Eugénie Larson, Senior Editor, Byte and Switch

Read more about:

2003
SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights