5 Tips for Optimizing Multi-Region Cloud Configurations

Managing a network of region-specific cloud environments comes with its own set of challenges.

David Balaban

November 18, 2024

5 Min Read
Managing a network of region-specific cloud environments comes with its own set of challenges.
(Credit: Tom Wang / Alamy Stock Photo)

Multi-region cloud configurations are popular for good reasons. Chief among them is the need for failover handling that ensures availability, speeds up disaster recovery, and prevents data loss in the event of major incidents or regional outages. We all know that regions can go down, and a multi-zone cloud isn’t always enough to avoid extended downtime.

Another driving factor is the desire to offer lower latency for end users. That's especially true if you have a global client base or if you use your cloud network to run lag-sensitive services like gaming, on-demand rich media streaming, or videoconferencing. Less often, you’ll choose a multi-region cloud to comply with data handling regulations, although that’s increasing in importance as regulations mount up.

In this post, I will share tips I've picked up for optimizing your multi-region cloud architecture. Yes, some will be more relevant for active-active environments, and some will be better suited for active-passive, but whatever your underlying use case, you'll find valuable advice.

1) Stay as DRY as Possible

Unfortunately, when DevOps teams configure a multi-region cloud, their dedication to Don’t Repeat Yourself (DRY) principles can fly out the window. There’s no need to rewrite each set of code for each region.

Related:Preventing Network Outages in Complex Enterprise Environments

Instead, use a terraform map variable, which is a data structure in Terraform and OpenTofu for storing key-value pairs. You can define different types of values using type labels, such as “map(string)” or “map(objects)”.

For example, you can specify a variable named “region_map” as the type “map(string)”. Then you can store Amazon Machine Image (AMI) IDs for different regions and keys (regions) as strings. Now, you're able to adjust configurations according to the region without repeating the entire Terraform code.

2) Centralize Provisioning Across Regions

Multi-region cloud configurations get very complicated very quickly, especially for active-active environments where you’re replicating data constantly. Containerized microservice-based applications allow for faster startup times, but they also drive up the number of resources you’ll need.

Even active-passive environments for cold backup-and-restore use cases are resource-heavy. You’ll still need a lot of instances, AMI IDs, snapshots, and more to achieve a reasonable disaster recovery turnaround time.

To keep track of where all your resources are provisioned, use a centralized dashboard like Amazon Elastic Compute Cloud (EC2) Global View. It groups together all your Amazon EC2 resources like Virtual Private Clouds (VPCs), security groups, instances, subnets, and volumes. Another alternative is AWS Organizations with Control Tower, which delivers centralized governance across your active regions.

Related:A Deep Dive into the Recent Microsoft Cloud Outage

3) Manage Data with Geo-Partitioning

Data sovereignty and data handling laws can be a serious headache when you’re configuring a multi-region cloud. You have to comply with the local regulations about data storage, data processing, minimization, access, etc.

You don’t want to have to impose strict GDPR requirements on data that you need for your U.S. cloud, so you might be tempted to set up separate databases for each region. But that can add friction when it comes to replication, dragging down data recovery speed. Maintaining a single multi-region database supports horizontal scalability for efficient data distribution and optimal performance.

Look for a multi-region database that enables geo-partitioning so that you can divide a single database into partitions based on geographic regions. The subset of data in each partition is stored and processed locally, thereby keeping latency to a minimum and simplifying compliance with local regulations. Just create partitions for specific rows and configure placement policies for each one, according to regulatory requirements.

4) Choose Your Data Replication Tactic Wisely

The CAP theorem forces you to choose only two of the three options: consistency, availability, and partition tolerance. Since we’re configuring for multi-region, partition tolerance is non-negotiable, which leaves a battle between availability and consistency. Yes, you can hold onto both, but you’ll drive high costs and an outsized management burden.

If you’re running active-passive environments, opt for consistency over availability. This allows you to use Platform-as-a-Service (PaaS) solutions to replicate your database to your passive region. But active-active environments almost always prioritize availability, which is a bigger challenge.

The best option is to embrace asynchronous systems and replication for eventual consistency. Most systems use the “last write wins” approach to reconciliation, which requires you to design your applications carefully for non-blocking interfaces. All user interactions must resolve asynchronously and without a mandatory backend response. Decoupling requests to the server from the user interface hides network latency and sometimes even network failure.

5) Keep Your Routing Options Open

For active-passive environments, routing isn’t a serious concern. You’ll use default priority global routing to support failover handling, end of story. But for active-active environments, you’ll want different routing policies depending on the situation in that region.

In these cases, you also don’t want to configure a separate DNS routing service for each region. Instead, use Amazon Route 53 to support a variety of traffic flow routing policies with DNS failover.

For example, select latency routing to route traffic according to the resource with the lowest latency; weighted routing to specify proportions for multiple resources; geolocation routing to prioritize user location; and geoproximity routing to prioritize resource location. You get the idea.

A Final Word on Optimizing Multi-Region Cloud Configurations

I've shared advice for the scenarios that most frequently leave DevOps teams wondering what to do next. But multi-region cloud configuration is always going to be a complex undertaking, especially for active-active environments. You'll always need to think through your primary objectives and decide what you'll prioritize and what you'll deprioritize to avoid burning through time, money, and energy.

About the Author

David Balaban

David Balaban is a computer security researcher with over 17 years of experience in malware analysis and antivirus software evaluation. David runs MacSecurity.net and Privacy-PC.com projects that present expert opinions on contemporary information security matters, including social engineering, malware, penetration testing, threat intelligence, online privacy, and white hat hacking. David has a strong malware troubleshooting background, with a recent focus on ransomware countermeasures.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights