Continuity Upgrades DR/HA Monitoring Software

RecoverGuard 4.0 offers improved support for clusters and root cause analysis, and the ability to detect configuration drift resulting from infrastructure changes

March 17, 2009

3 Min Read
NetworkComputing logo in a gray background | NetworkComputing

It is expensive and time consuming to maintain disaster recovery and high availability under normal circumstances. It becomes more challenging with the number of changes that IT managers make to servers, storage, and other data center infrastructure on a daily basis. If the changes made in the primary data center are not also made at the secondary DR/HA site, there is a good chance that the backup site won't do its job when a disaster hits.

Continuity Software has added new features to version 4.0 of its RecoverGuard to combat what it calls "configuration drift" by automatically detecting changes to domains, DNS settings, installed products, patches, service packs, storage routing, operating system version, kernel parameters, and hardware that can create gaps and potential vulnerabilities for disaster recovery and high availability sites if the same changes are not made at those secondary sites.

Continuity's founder and CEO Gil Hecht says large enterprises' IT departments are making hundreds of changes a day to the production systems in their data centers and not doing a good job of ensuring that the same changes are made to their DR/HA sites. "They'll do a test once a year of their DR site and find that none of the systems work. Or they'll turn on a high-availability cluster to do a failover test and then the cluster will shut down the production server, which guarantees there will be downtime. This is a very risky situation for companies," he says.

The product enhancements include a high-availability cluster verification tool, an availability adviser, a root cause analysis tool to point out which infrastructure changes may have led to a problem, and a deployment analysis tool to find and report on infrastructure assets and data that are not being protected.

"There are no products that can turn on all the servers to see if the changes made to the HA and DR servers are OK," says Hecht. "Companies have invested millions in these systems and don't know if they are working." In customer tests, he says smaller companies are finding two to five problems a month that would result in the loss of critical data while larger companies are finding more than 30 problems a month.RecoverGuard scans systems using standard APIs, not agents, and builds a map of the systems and their interrelationships. It uses a "gap database" with 3,000 different problems to identify potential problems within individual systems or data centers. Common problems that RecoverGuard finds, Hecht says, are file systems with three volumes on a SAN and only two are replicated to a secondary site, or snapshots of a database contained in four volumes that were not taken at exactly the same time, resulting in consistency problems.

Some vendors offer configuration management databases to help IT managers monitor changes to their storage and servers and networks, but those systems don't offer analytical capabilities to help users uncover problems. "We are adding another layer of smart analytics on top of a CMDB," Hecht says.

Tools to help storage administrators and IT managers better prepare for, test, and manage disaster recovery systems are becoming increasingly important, says David Hill, the principal of consulting firm Mesabi Group. "Businesses are lucky that disasters are so rare because, if they were common, the failure of many businesses to perform disaster recovery properly would be exposed," he says.

Most companies fail to do adequate disaster recovery testing because of the time and cost involved, which could pose major problems if a disaster should actually take place. Tools like RecoverGuard can help because they identify the differences between what is and what should be in a non-disruptive fashion, Hill says, and the addition of gap detection for high availability and the ability to do root cause analysis are big steps forward. "Continuity Software has solved a big and often neglected problem in HA and DR infrastructures that otherwise would lead to preventable data protection failures," he says.

Find out more about innovative storage. InformationWeek and Byte and Switch are hosting a virtual event on this topic on March 25. Sign up now (registration required).0

Read more about:

2009
SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights