The Long-Distance LAN
Linking data centers for high availability is tricky. We have the plan you need.
November 7, 2011
Application failure can be pricey, particularly when it's a business-critical system. One uptime strategy is to create a data center interconnect (DCI) link, so that if a failure occurs in one data center, the application can continue to run in the second.
There are two approaches to making an application highly available via a DCI. First, you can set the application to be active in one data center and on standby in the second. In case of a problem at the first site, the application can switch over to the second data center and remain active. Hypervisor technologies like VMware's vMotion, which lets virtual machines move from one physical server to another, can assist in this process.
The second option is to synchronize the application so that it runs simultaneously in both data centers. Technologies such as clustering, sharing, and storage replication can help you synchronize. However, many clustering and replication technologies are dependent on sharing a single Ethernet network, and expect to unicast, multicast, or broadcast Ethernet data to all elements--servers, databases, and storage--in the cluster. The problem here is that while Ethernet works well for a few hundred meters over copper in data centers, or even a few kilometers over fiber, after that you run into technical hurdles, including latency and bandwidth challenges, that make the process of building a DCI difficult. Carriers have introduced services, such as virtual private LAN service, that are supposed to help IT solve some of these problems, but most have serious implementation limits and are often ill-suited to supporting highly available applications. Still, there are ways around these challenges and some innovative alternatives for building a DCI. Your best options--which, as often is the case, are also the most expensive--are techniques such as multichassis link aggregation using dark fiber and dense wavelength division multiplexing (DWDM) services.
Latency Problems
Latency is a significant problem with few good solutions. There are three primary causes of latency, but the most significant and intractable is distance. The farther a signal must travel, the longer it takes to propagate through the provider's network. The most common baseline for acceptable latency between data centers is based on VM migration specs, such as those for vMotion for VMware vSphere servers. VMware states that there must be less than 5 milliseconds of latency between source and target servers. The practical upshot is that data centers cannot be more than 75 kilometers apart if you expect reliable operation of VM migration; 50 km is even better.
Latency also affects storage replication, especially synchronous replication, where the data-block write must be duplicated between sites within 5 to 10 milliseconds, depending on your recovery point and recovery time objectives for the application in question.
Another, less obvious, cause of latency is the fact that carrier networks use tunneling protocols, such as MPLS, ATM, and even Sonet. A particular problem with MPLS networks is that the carrier can't guarantee--and may not even know--the path any given packet will take between two points in its network. Carrier networks may hop through several nodes within a city, adding milliseconds of processing latency while the Ethernet frame is forwarded.
Strategy: The Long-Distance LAN
Tying Together Data Centers for App Availability
Our full report on long-distance LANs is free with registration.
This report includes 14 pages of analysis and 7 charts. What you'll find:
Practical advice for building a data center interconnect
Ethernet problems for a DCI, and real-world solutions
Loop Prevention
Ethernet's design introduces another technical hurdle in building a DCI. Intrinsically, it's a multiaccess technology that allows all Ethernet broadcast and multicast frames to be received by all endpoints on the network. Therefore, any time a host sends an Ethernet broadcast frame, that frame must be forwarded across all Ethernet networks, including the DCI. When a broadcast frame is looped back into an Ethernet network, it will be forwarded by all switches, even though it was already broadcast. This creates a race condition that rapidly consumes network bandwidth and results in catastrophic network failure.
The Spanning Tree Protocol was developed many years ago to prevent such loops, and we still rely on it. The problem is that neither Spanning Tree nor its updated version, the Rapid Spanning Tree Protocol, works well over long distances. When the circuit latency is above 250 milliseconds, RSTP is no longer useful for loop prevention and resilience.
Because Spanning Tree is a poor technology for preventing loops in a DCI, vendors have developed a number of technically complex alternatives. Two good ones are Multi System Link Aggregation (MLAG) and Cisco's Overlay Transport Virtualization (OTV).
MLAG is the most common, and commonly recommended, option for connecting data centers using point-to-point Layer 2 services, and most major networking vendors, including Avaya, Cisco Systems, Dell/Force10, Hewlett-Packard, and Juniper, support the spec.
MLAG defines the process of binding two or more Ethernet switching units into a single operational unit. The basic idea is that two switch chassis have a single control plane, and can therefore use link bonding or aggregation to take two Ethernet links and make them into a single connection. Link Aggregation Control Protocol is then used to combine Ethernet connections between these chassis into a single connection at the logical level. MLAG works most effectively for short-haul circuits or DWDM, where you have access to dark fiber, and allows for native Layer 2 VLANs and Layer 3 routed services at the same time.
Another option is Cisco's OTV, which encapsulates Ethernet frames into IP packets and therefore can use any Layer 3 transport between data centers, for a lower carrier cost. OTV is helpful for enterprises looking to manage, control, and gain visibility into their Layer 2 DCIs using existing Layer 3 MPLS services. However, OTV's benefits are offset by licensing and hardware costs, which can get expensive, and the fact that OTV is currently limited to Cisco Nexus 7000 switches and ASR 9000 routers. And its performance still depends on the underlying carrier service.
Dark Fiber And Light Waves
Another option is to deploy your own cable runs between data centers using dark fiber. While this isn't always practical or even possible due to government regulations, it does provide the greatest certainty and least amount of complexity. If you have access to your own fiber, MLAG is the best option for simplicity and ease of operation for dedicated Layer 2 services.
If dark fiber isn't an option, you can investigate DWDM, which works by multiplexing your circuit directly onto a laser wavelength and then "repeating" the physical signal across the network. Your data isn't forwarded, routed, bridged, or encapsulated. You receive guaranteed bandwidth, and have full control over elements such as quality of service and traffic flows. It's a clear pipe.
However, the capital costs for both DWDM and dark fiber are significant, meaning that high levels of return must be achieved. That can be difficult, so many of the companies we work with opt to run both Layer 3 and Layer 2 services over them. Because DWDM and dark fiber deliver guaranteed bandwidth from end to end, and you're not sharing bandwidth with other services, you control all elements of the system, and you can make decisions about QoS, traffic control, and performance.
Greg Ferro is a consulting network architect. Write to us at [email protected].
InformationWeek: Nov. 14, 2010 Issue
Download a free PDF of InformationWeek magazine
(registration required)
You May Also Like