Introduction to Multiple Region Architecture

Reference URL: https://read.acloud.guru/why-and-how-do-we-build-a-multi-region-active-active-architecture-6d81acb7d208

In order to have a multi-region, active-active architecture across multiple geographical regions, several requirements have to be fulfilled:

  • Data replication between regions must be fast and reliable

  • You need to have the global network infrastructure to connect between the different regions.

  • Services should be stateless as much as possible. States should be shared between regions.

  • Applications should use regional resources and reduce synchronous cross-regional API calls.

  • DNS routing needs to be included in the architecture.

Consistency, Availability & Partition tolerance (CAP)

For a distributed system, the CAP theorem states that it is impossible to simultaneously guarantee more than two out of these three factors.

  • Consistency: Every read receives the most recent write or an error.

  • Availability: Every request receives a (non-error) response, without guarantee that it contains the most recent write.

  • Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped by the network between nodes.

The CAP theorem implies that, in the presence of a network partition, one has to choose between consistency and availability.

  • By using asynchronous replication, we can choose to give up consistency by design.

  • Asynchronous Replication decouples the primary node from its replicas at the expense of introducing replication lag or latency.

  • This results in Eventual Consistency, which results in replica convergence.

The Practical Approach

Now that we understand the CAP theorem for distributed systems, and bearing in mind that there are network latencies with replication across regions, we have two options for asynchronous replication:

  • Application level: Changes are propagated to other instances of the application residing in a different region. This would probably be most effective if done with a federated messaging broker. Rabbit MQ is one such broker that supports this for applications.

  • Database level: Here, we select a database that is good at this sort of thing - asynchronous database replication across regions.

Since modern cloud-native applications are usually stateless, the data that usually changes is in the database. The preferred and practical approach would be to let the database handle this. That way, we don’t need to write extra code at the application level to handle asynchronous replication if we can avoid it.

If our application design can make use of simple technologies such as rsync for file replication to reduce the need for database level multi-region replication, always make use of these simple approaches first.