Cluster

The cluster implementation of the Software Appliance uses Wireguard connections for all cluster communication. This means that the cluster nodes do not need to be physically located close to each other as long as they have good network connectivity. However, this also means that a node cannot distinguish between the failure of another node and an interrupted network connection to the other node. To avoid cluster nodes operating independently and receiving different data sets (a so-called split-brain situation), the cluster nodes coordinate and stop operating if they do not belong to the majority of connected nodes. This ensures that only one data set can be updated at a time. In the event of a temporary network outage, the unconnected nodes can easily synchronize their data with the majority data set and continue to operate.

Definition of Availability

The options on the Software Appliance Cluster page allow to:

add cluster nodes
monitor an existing cluster and manage its cluster nodes
find detailed information about the cluster members and their current status
an easy-to-use locking function prevents editing conflicts

Levels of Availability

Availability is defined as the ability to keep the service running with full data integrity for the applications running on the Software Appliance.

Stand-alone instance

This is a basic single node installation of the Software Appliance.

In case of a node failure, a new Software Appliance needs to be reinstalled from a backup. All data between the time of the last backup and the failure will be lost. If no cold standby (a spare Software Appliance) is available, the time of provisioning the new VM must be taken into account when calculating the acceptable downtime.

Hot standby with manual fail-over

In this configuration, two nodes are connected to form a cluster.

The first installed node has a higher quorum vote than the second node.

If the first node fails, the second node stops operating and is set into maintenance mode.

If the second node fails, the first node continues to operate. The second node is set to the maintenance state.
To bring the second node back into operation, manual interaction via the Software Appliance‘s administrative interface (Webconf) is required.

Manual intervention is also required to avoid data loss. The second node should only be Forced into Primary if the first node really is dead and cannot be recovered.

High Availability with automatic fail-over

This is a setup with three or more nodes.

If one node fails, the remaining nodes can still form a cluster by a majority quorum vote and continue operation. If the Software Appliance that has failed is still switched on it will be set into maintenance.

To ensure that quorum votes never result in a tie, all nodes are assigned a unique quorum voting weight according to their assigned node number (Weight=128−NodeNumber).

In a setup where an even number of nodes N is evenly distributed between two sites, the site that should remain Active if the connectivity between the sites fails should have a larger sum of quorum vote weights than the other site.

Since cluster nodes with a lower numbers of nodes have a higher weighting, you should deploy nodes 1 to N/2 at the primary site.