Utility Sings Virtualization's Praises
Las Vegas Valley Water District taps virtual machines as cornerstone of DR plan
November 9, 2006
LOS ANGELES -- VMWorld -- Dave Trupkin always thought disaster would come in the form of a fire or an earthquake. As senior systems administrator for the Las Vegas Valley Water District in Nevada, he's paid to consider such prospects -- as well as the contingencies.
But "The Disaster," as he refers to it, did strike over Labor Day weekend two years ago in the form of a data center power failure -- less flashy than a natural disaster maybe, but no less dramatic in its impact on the water utilitys operations, he told an audience here today.
System power was supposed to have been restored the next day, but it took almost three days to get servers, systems, databases, and applications humming again. And that was only because Trupkin and his department were able to scare up some very tough-to-find generators, which LVVWD then relied on for eight more weeks till the electrical problem could be identified and repaired. (Overheated, parallel power lines had burned through their insulation.)
The utility was no stranger to VMware products or the upside of virtualizing servers and applications. In fact, use of the vendor's ESX Server to restore servers quickly and relatively painlessly after The Disaster "made us heroes with management," Trupkin laughed. It also cleared the way for broader use of virtualization across LVVWD's information systems and use of a collocation facility 10 miles away, hot-linked to LVVWD's data center with fiber pairs.
The Disaster pointed up the woeful inadequacy of the utility's "Tape and Pray" DR plan, and it sensitized management, IT staff, end users, and customers to the need for redundancy, smarter and faster backup, and electrical powering issues beyond LVVWD’s control."It takes six weeks to get a new circuit breaker installed for a new server," Trupkin says. "[Electrical] power is our main issue, so we had to learn to do more with less. And that led us to virtual machines," which wring maximum efficiency out of the processing power and capacity of each active server.
"We have 17 hosts now, and any new servers must be virtual, unless you can prove that there's no way to run an application in a virtual environment," Trupkin says. Application software vendors, once notorious for their pushback against virtualized apps, "have been dragged to the party kicking and screaming," he notes. Voice platform maker Avaya resisted the virtualization pressure till it saw what LVVWD could do with virtualized voice apps. "Now they push their other customers to virtualize."
An audience query, "Can you get them to call Cisco?" prompted some laughs and many nods of recognition.
Sadder but wiser (and a lot more tired) for the experience, Trupkin offered up these lessons learned:
Keep better tabs on user requirements. "We should have been doing regular reviews of application priorities," he said. In the wake of The Disaster, "demand for email caught us flat-footed," as he and his staff worked on systems or apps they presumed would be more important.
Be aware of how everything's connected. "You need to understand application dependencies and how systems interwork" when a field engineer goes out and fixes something and then updates the record, that flows across business and engineering systems, Trupkins explains. Similarly, bringing back an Oracle database after a failure requires access to DNS servers and domain controls, secondary access that gets lost or ignored in a lot of DR planning.
Consider backup methods. Caching issues with primary servers and remote copies create some real I/O problems that can retard overall performance. A mix of synch and asynch copying has proven a practical solution for the utility, Trupkins says.
Weigh the human element. After four straight days onsite after The Disaster, Trupkin wishes there'd been more of a designated "rest area," rather than grabbing 20 winks under a conference room table. He also emphasized the criticality of communicating as much information as frequently as possible to secondary staff, especially when there are public-facing departments forced to explain why customers can't pay their bills.
Write it down. Critical knowledge about systems, locations, idiosyncrasies, or contact information may reside only in someone’s head (or speed dial). "Get it all on paper," Trupkin advises.
— Terry Sweeney, Editor in Chief, Byte and Switch
Avaya Inc. (NYSE: AV)
Cisco Systems Inc. (Nasdaq: CSCO)
Oracle Corp. (Nasdaq: ORCL)
VMware Inc.
Read more about:
2006You May Also Like