IBM RVA (Not Shark) Downs Danish Bank

Danske Bank's critical IT systems went down for more than 24 hours after IBM storage array failed UPDATED 3/21 2:30PM

March 22, 2003

4 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Editor's Note: The original version of this story incorrectly reported that Danske Bank had experienced a failure in one of its IBM Enterprise Storage Server (a.k.a. Shark) systems. In fact, the system that failed was an older IBM Ramac Virtual Array (RVA). Byte and Switch regrets the error.

Danske Bank Corp. in Denmark saw its IT operations crippled for more than 24 hours after an IBM Corp. (NYSE: IBM) disk subsystem failure caused 90 databases to freeze up.

The problems began during a routine maintenance job on March 10, when a power unit in an IBM Ramac Virtual Array (RVA) storage system had to be replaced. During this process the storage system went down, triggering a string of problems in recovering the data from its IBM DB2 databases, according to Peter Schleidt, CTO of Danske Bank.

"On the restart of the IBM DB2 subsystem, the problem really took hold," Schleidt tells Byte and Switch.

Danske Bank, the 18th-largest bank in Europe, has 3 million customers worldwide and manages approximately 50 Tbytes of data on a combination of IBM Enterprise Storage Server (a.k.a. Shark) and RVA storage arrays. In addition, the bank uses IBM DB2 database software at the core of its One Group One System infrastructure program, which aims to centralize the management of all the bank's back-office IT operations."Many large international banks have separate systems in each country, but centralizing internal finance, human resources, and payroll operations gives us a lot of benefits on the business side," says Schleidt. "It's an efficient architecture, and we have proven it works when we have acquired other banks."

Well, it was great, until it broke down, Schleidt admits. When the bank attempted to restart DB2 from its last backup, a bug in the database software caused inconsistencies to occur in the data. This corruption caused Danske Bank's trading desks, currency exchange, equities, and clearing with other banks to grind to a halt.

IBM engineers worked locally and from the company's DB2 labs in the U.S. to fix the problem. According to the bank, IBM is now sending out a patch to all its mainframe customers running DB2.

This wasn't the first time Danske Bank has run into problems with IBM storage equipment. In November 2002, one of its Shark subsystems disconnected from the network after a technician incorrectly configured two Sharks with the same IP address, according to an IBM spokesman. IBM says this outage, which caused the bank's e-banking operations to go offline temporarily, was attributable to "human error" and says steps have been taken to prevent such outages in the future.

After Danske Bank's DB2 systems went down last week, it was able to fall back on its asynchronously mirrored site, 200 kilometers away from the primary data center, to keep some of its services -- such as its ATM machines and Internet banking service -- up and running.But because of the distance between the data centers, it has been unable to run all of its IT operations, many of which require synchronous mirroring between the two sites.

Synchronous and asynchronous data mirroring differ in a couple of ways. First, synchronous mirroring offers a higher level of data protection. In this mode, data is written simultaneously to both the local and remote storage systems. The local and the remote copy of the data are identical and concurrent at all times. For trading and clearing processes at a bank, this is a key requirement.

Synchronous mirroring requires that the remote system acknowledge receipt to the local system before a write is committed to disk on either system and before the next I/O is processed. It's appropriate when accuracy is critical and tolerance of data loss is very low.

With asynchronous mirroring, such as that currently employed by Danske Bank, local and remote copy sets are created but are not identical and concurrent at all times. Asynchronous transfers write data to the local storage system, acknowledging I/O completion prior to synchronizing data with the remote storage system.

Danske Bank is fixing this problem with a third data center, located closer to its primary site, which will open in about two months and will perform synchronous mirroring. The bank will also add IBM's Geographically Dispersed Parallel Sysplex (GDPS) software on top, it says, in order to avoid rolling disasters.Danske Bank's IT systems and Internet banks are operating normally now, the bank says. As a consequence of the IT problems, however, the bank expects to have to pay compensation to a number of customers.

In a statement, the bank said: "We are analyzing the causes of this unacceptable systems breakdown and will publish a statement when this work is completed." The enquiry is expected to take about a week, bank officials say.

According to Schleidt, Danske Bank has not yet negotiated monetary compensation from IBM for the most recent outage, but it does expect to do this at some point. But for now, he says, it's more important to work with IBM "and agree on actions to minimize our operational risks for the future."

Jo Maitland, Senior Editor, Byte and Switch

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights