Reality IT: The Harsh Reality of Disaster Recovery

When a trigger-happy CFO crashes the backup systems, one IT team learns some tough lessons about proper disaster-recovery planning -- or lack thereof.

July 2, 2004

3 Min Read
NetworkComputing logo in a gray background | NetworkComputing

When we discovered the source of the outage, we knew we were in trouble. Power outages are among the most common scenarios for disaster-recovery planning, but we hadn't planned for a trigger-happy CFO taking down our backup system. From the look on his face, I wasn't sure whether our CIO, Steve Fox, was just thinking about strangling Beane, or whether he was actually going to do it.

What Else Can Go Wrong?

We didn't know how long it might take to fix the transfer switch, so we swung into action, using our disaster-recovery plan as a guide. Following a meeting with my network manager, Dirk Packett, and my telecom manager, Sandra Hook, we were able to redirect incoming calls to one of our other call centers. This gave us a temporary workaround, but as the minutes ticked by, we noticed some glaring disconnects between our plan and the reality of the situation.

According to the plan, the downed call center was supposed to transfer operations to one of our warehouses several miles away. Unfortunately, most of the space at the warehouse had been converted into a repair lab since the last update of the recovery plan. There was some space left, so management decided we should move as many call-center employees into the backup location as possible.

The disaster-recovery plan didn't tell us where to find the computers for the backup location, so we began moving machines from the primary building to the backup site. This process took hours, and required us to scrounge up the necessary network hubs and switches. Luckily, the warehouse was cabled properly.The plan did say the call-center staff should use paper forms until the computers came online, but guess what? No copies of the forms had been made. Phones were a problem, too, because the warehouse wasn't properly trunked to support a call center, and we didn't have the call-center phone-system features we required at that site.

Just as we were scrambling to solve these problems, we heard from the third-party contractor that maintains the UPS and generator. The transfer switch had been replaced. Thank goodness, because it was clear that our contingencies weren't meeting our needs.

Hard Lessons

We learned a lot from our self-inflicted disaster. First, make sure your backup facilities have the required resources. Second, be sure backup for key groups or applications, such as call centers, is fully and specifically tested. We had been doing annual tests prior to the outage, but they didn't go deep enough. Third, consider calling in an expert. At ACME, we're bringing in a disaster-recovery planning consultancy to help us conduct a business-impact analysis and thorough annual testing.

Some 43 percent of U.S. companies never reopen after a disaster, according to statistics from the National Fire Protection Agency. Another 29 percent close within three years. ACME survived its self-imposed calamity, but we aren't willing to risk it happening again.Hunter Metatek is an enterprise IT director with 15 years' experience in network engineering and management. The events chronicled in this column are based in fact--only the names are fiction. Write to the author at [email protected].

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights