Kartikay Luthra

Oct 9, 20239 min read

Node Discovery and Cluster Formation in Akka Clusters

In the last blog, we briefly touched on Akka Clusters and where they can be used, what problem is solving and how it is solving, we looked at some of the key concepts that make up the Akka Clusters framework and how these key concepts help us in building concurrent and fault-tolerant applications, In case you missed our previous blog here is a link to it: Akka Clusters and we recommend you go through the last one as well, so that you can get a brief understanding of whatever content we pen-down here.

In the world of distributed systems, Akka Clusters play a vital role in ensuring that individual nodes can communicate, coordinate, and work together seamlessly. The foundation of this functionality lies in the discovery and formation of a cohesive cluster. In this blog, we'll embark on a technical journey into the intricate mechanics of node discovery and cluster formation in Akka Clusters.

Understanding Node Discovery

Node discovery is the process by which individual nodes within a distributed system become aware of each other's existence. Akka Clusters use various mechanisms for node discovery, allowing nodes to find and establish connections with their peers. Let's explore some of these mechanisms:

1. Multicast:

Multicast-based node discovery involves nodes broadcasting their presence and receiving broadcasts from others. This approach is efficient and suitable for smaller, self-contained networks. However, it may not be ideal for larger, more complex environments due to multicast limitations. Here’s a real life example to better explain how multicast system works:

Create a Multicast Group: Imagine you have a group of friends who want to communicate using walkie-talkies in a park. To make it easier, you all decide on a specific channel to communicate, say Channel 7.

Broadcast Messages: When someone wants to say something, they simply speak into their walkie-talkie on Channel 7. The message is sent as a broadcast, which means everyone tuned to Channel 7 can hear it.

Listen for Messages: Each friend keeps their walkie-talkie tuned to Channel 7, listening for any messages that come through. When they hear a message, they can respond.

Everyone Hears the Message: All friends who are tuned to Channel 7 will hear the message. This allows them to communicate with each other even though they might be at different locations in the park.

This method of node discovery however is not commonly used by Akka Clusters for reasons mentioned below:

Implementing multicast discovery requires hardware support such as support from routes and switches, as you can tell this is very inefficient for large scale distributed systems.
Multicast networks are also hard to encrypt, as encryption was designed for one-to-one communication, when a message is broadcasted to a large number of nodes at the same time, it becomes hard to encrypt messages in that regard
Many cloud-based network supporters do not support multicast which makes akka clusters also not use it as much as some people might prefer.
Multicast Discovery as we have mentioned can lead to scalability issues as well, transmitting messages in large scale systems can become problematic. As the number of nodes increases, the volume of multicast messages can become overwhelming, potentially causing network congestion and performance issues.

2. DNS (Domain Name System):

DNS-based discovery relies on DNS records to locate other nodes in the cluster. Nodes query DNS for information about cluster members, making it a versatile option for discovering nodes across different network segments. This method is applicable in any situation, if we are in a small scale use case or we are in a large scale use case. It is also great for dynamically discovering nodes in the cluster:

1. Small-Scale Deployments:

Simplicity: In smaller deployments or environments with a limited number of nodes, DNS-based discovery can be relatively simple to set up. You can configure each node with the DNS name or IP address of other nodes, allowing them to find each other easily.

Low Administrative Overhead: Small-scale deployments often involve fewer nodes, which means less administrative overhead in managing DNS records and configurations. This simplicity can be advantageous when dealing with a smaller cluster.

Quick Setup: DNS-based discovery in small-scale scenarios is typically straightforward and can be set up relatively quickly. It provides a basic and effective way for nodes to discover and join the cluster without complex configuration.

2. Large-Scale Deployments:

Dynamic Discovery: In large-scale deployments with a significant number of nodes, DNS-based discovery can still be effective if used dynamically. Instead of configuring each node with the addresses of all other nodes (which can become impractical), nodes can query DNS for information about cluster members at runtime.

Load Balancing and Redundancy: DNS can be used to distribute traffic across multiple nodes by associating multiple IP addresses with a single DNS name. This approach can provide load balancing and redundancy, improving the scalability and fault tolerance of large clusters.

Integration with Cloud Services: In cloud environments, DNS-based discovery can be integrated with cloud service discovery mechanisms. Cloud providers often offer DNS-based service discovery, allowing nodes to discover each other across different network segments or regions.

Key Considerations:

Dynamic DNS: For large-scale dynamic deployments, consider using dynamic DNS updates or DNS-based service discovery mechanisms, which allow nodes to register and unregister themselves with DNS servers as they join or leave the cluster.

DNS Caching: Be aware that DNS responses can be cached, potentially leading to delays in discovering changes in cluster membership. Strategies for handling DNS caching should be considered, especially in dynamic environments.

Security: Ensure that DNS configurations and records are appropriately secured, as incorrect DNS configurations can lead to security vulnerabilities.

In summary, DNS-based discovery can be applied effectively in both small-scale and large-scale deployments. In smaller deployments, it offers a straightforward and easy-to-manage solution. In larger deployments, it can still be used dynamically and can be integrated with cloud services to accommodate the scalability and dynamic nature of large clusters. The choice between static and dynamic DNS configurations will depend on the specific requirements of your Akka Cluster deployment.

3. Custom Providers:

Akka Clusters also allow you to implement custom discovery providers tailored to your specific network environment and requirements. This flexibility ensures that Akka can adapt to diverse infrastructures.

Benefits of Custom Discovery Providers:

Flexibility: Custom providers offer the ultimate flexibility to adapt Akka Clusters to your needs. You have full control over the discovery process, allowing you to handle unique situations or requirements.

Integration: You can integrate Akka Cluster seamlessly with other systems, network architectures, or cloud services, ensuring that your cluster fits within your broader infrastructure.

Scalability: Custom discovery providers can be optimized for scalability and performance, making them suitable for large-scale distributed systems.

Configuring and Fine-Tuning Discovery Strategies

Configuring node discovery in Akka Clusters involves specifying the discovery method, configuring network settings, and fine-tuning parameters. Here are some key considerations:

Discovery Method: Choose the appropriate discovery method based on your network architecture and scalability requirements. Modify the configuration to use multicast, DNS, or custom providers as needed.

Networking Settings: Configure network interfaces, ports, and addresses to enable node communication. Ensure that nodes can reach each other over the network.

Tuning Parameters: Adjust timeouts, intervals, and retry mechanisms to optimize node discovery and react promptly to changes in the cluster's membership.

Cohesive Cluster Formation with the Gossip Protocol

Once nodes discover each other, the next step is to form a cohesive cluster. Akka Clusters use the gossip protocol for this purpose. The gossip protocol is a decentralized way for nodes to exchange information about the cluster's state and membership and is used as a way for maintenance of clusters.

Gossip Rounds:

Nodes periodically exchange gossip information in rounds, disseminating details about their view of the cluster. This includes information about other nodes, their status, and their perceived cluster state. To explain this in simpler words let us look at an example:

Certainly! Let's break down how the gossip protocol works in Akka Clusters in simpler terms:

Imagine a Group Chat:

Think of a group chat with several people, but there's no single leader or organizer. Instead, everyone in the group chats with each other independently. This is similar to how nodes in an Akka Cluster communicate.

Sharing Information:

In this group chat, everyone has a notebook where they write down important information about the group. Whenever someone learns something new or wants to share an update, they write it in their notebook.

Spreading Updates:

Now, here's the interesting part. Every so often, people in the group randomly choose one or more other people and share the updates from their notebooks. It's like sharing secrets or news with a few friends in the group.

Information Exchange:

As time goes on, everyone in the group is doing this—sharing updates with a few friends at random intervals. Because of this, information gradually spreads throughout the entire group.

Cohesive Cluster Formation:

In Akka Clusters, nodes work similarly. Each node keeps its own "notebook" with information about the cluster, such as which nodes are part of it. They periodically share this information with a few other nodes.

Over time, as nodes keep exchanging these updates, they all converge on a common understanding of the cluster's state and membership. This process helps form a cohesive cluster where all nodes know about each other, even if they start with incomplete information or if nodes join or leave the cluster.

So, the gossip protocol in Akka Clusters is like a group of friends chatting and sharing updates with each other randomly, ensuring that everyone eventually knows the same information about the cluster they belong to. This decentralized approach helps create a cohesive cluster without relying on a central coordinator.

Leader and Membership Management:

Akka Clusters elect a leader responsible for managing the cluster's membership. The leader ensures that the cluster remains stable and resolves issues related to node joins, leaves, and failures.

Advanced Techniques for Handling Network Partitions and Split-Brain Scenarios

In distributed systems, network partitions can occur, isolating groups of nodes from each other. This can lead to split-brain scenarios where multiple independent clusters form within the network.

Quorum Systems:

To prevent split-brain scenarios, Akka Clusters often employ quorum systems. Quorums require a minimum number of nodes to agree on cluster membership changes, ensuring that only one partition continues to operate. Let us try and understand Quorum systems from an real life example:

Imagine Making Important Decisions with Friends:

Think of a group of friends who need to make important decisions together. For example, they want to decide where to go for a weekend trip.

Quorum Rule:

Now, they establish a rule: "We can only make a decision if at least half of us agree." In this group of friends, there are 10 people. So, they need at least 5 friends to agree on a destination before they can make a decision.

Applying Quorums to Akka Clusters:

In Akka Clusters, nodes use a similar rule called a "quorum" when there's a network partition or split. Here's how it works:

- Imagine you have a cluster of nodes, and suddenly the network has a problem, causing the cluster to split into two groups that can't communicate.

- With the quorum rule, both groups will check how many nodes are in their group.

- If one of the groups has more than half of the total nodes, it can continue operating as the main cluster. The other group, which has less than half, knows it's in the minority and won't make decisions.

- This ensures that only the larger group continues to operate, preventing both groups from making conflicting decisions or causing data inconsistencies.

In essence, quorum systems in Akka Clusters are like the "majority rules" principle used by friends to make decisions. They help maintain order and consistency when network partitions occur, ensuring that only one part of the cluster keeps working to avoid confusion and conflicts.

Split-Brain Resolver:

Akka provides tools like the "Split-Brain Resolver" to automatically detect and resolve split-brain scenarios when they occur. These mechanisms help maintain cluster integrity and minimize data inconsistency. A Split Brain Resolver is a mechanism used in distributed systems, including Akka Clusters, to address and resolve split-brain scenarios. A split-brain scenario occurs when a network partition separates nodes in a distributed system, and each partition believes it is the primary or authoritative side of the system. This can lead to conflicting decisions and data inconsistencies, which need to be resolved to ensure system integrity.

Here's how a Split Brain Resolver works with Akka Clusters:

Detection of Network Partition: When a network partition occurs, Akka Cluster's failure detection mechanism identifies that the cluster has split into multiple partitions. This detection is based on the loss of communication between nodes.

Quorum-Based Decision-Making: Akka Clusters often employ quorum systems, as mentioned earlier. In the presence of a network partition, quorum rules are applied to determine which partition should continue operating as the "primary" cluster, and which should "down" or mark itself as not authoritative.

Marking Partitions: The Split Brain Resolver marks one of the partitions as the "down" or non-authoritative partition. This ensures that only one side of the partition continues to operate as the main cluster, while the other side gracefully stops making decisions.

Resolution Strategies: Akka Clusters support various Split Brain Resolver strategies, each with its own way of determining which partition to down. These strategies can be based on node roles, node priorities, or custom logic defined by the user.

Reconciliation: Once the network partition is resolved, and communication is restored between the partitions, the nodes perform a reconciliation process. This involves comparing their states, identifying any differences, and taking corrective actions to ensure data consistency and integrity.

Ensuring Data Consistency: The reconciliation process may involve merging data changes, rolling back conflicting transactions, or applying conflict resolution rules, depending on the nature of the application.

Conclusion

Node discovery and cluster formation are fundamental aspects of Akka Clusters that underpin the functionality and reliability of distributed systems. By understanding the mechanics of discovery, configuring discovery strategies, and leveraging the gossip protocol, you can ensure that your Akka Cluster forms a cohesive and resilient network of nodes, even in the face of network partitions and challenging scenarios. These concepts are crucial for building robust, fault-tolerant, and highly available distributed systems.

In the next blog, we'll explore the intricacies of message passing and coordination within Akka Clusters, delving into the core of distributed communication and collaboration. Stay tuned!

For any queries, feel free to contact us at hello@fusionpact.com