Skip to main content
Skip table of contents

On-Prem High Availability (HA) Guide

This guide describes how to configure the AgileSec Platform for high availability in on-premises environments.

Overview

This guide provides instructions for the following scenarios:

  1. Setting up a highly available (no single point of failure) AgileSec platform cluster for a new installation.

  2. Adding capacity to an existing cluster to eliminate single points of failure and achieve high availability.

Note: HA in this context means the cluster remains operational if one node fails at any given time. In case of multi region/availabilty-zone or multi-dataceter deployments called Stretch Cluster, HA also means that the cluster remains operational if one data-center fails.

Prerequisites

  1. Ensure you have the installer package and your nodes meet the minimum memory, CPU, and disk requirements.

Requirements

Backend (Minimum)

Backend (Production)

Scan-node

Frontend

Additional Frontends

Coordinator

CPU cores

4

8

2

4

2

2

Memory

32GB

64GB - small scan volume 128GB - large scan volume

8GB

16GB

16GB

16GB

Disk space

50GB

100GB - small scan volume 200GB+ - large scan volume

50GB

50GB

50GB

50GB

  1. For convenient file transfers between nodes, it is recommended to set up SSH key-based password-less authentication from backend-1 to all other nodes, though manual file transfer methods can also be used. Certificates and configuration files will be generated on backend-1 and copied to other nodes. Use consistent file names and directory structures when copying files between machines.

  2. Familiarity with the basic 3-node (two backends, one frontend) On-Prem installation guide is recommended. The section Adding a new Frontend to an existing cluster assumes you have a running 3-node cluster.

Four-Node Installation for New Cluster (2 Backends, 2 Frontends)

Note: It is recommended to build the cluster nodes in the following order.

Step 1: Setup Cluster Configuration on Backend-1 (be-1)

  1. Configure the cluster topology by editing the generate_envs/multi_node_config.conf file with your environment-specific details. Add the private IPs of all your nodes (be-1, be-2, fe-1, fe-2).

JAVA
cd <installer_directory>
vi generate_envs/multi_node_config.conf
  1. Add the private IPs of your two backend nodes and two frontend nodes.

  2. Uncomment the following frontend-2 entries in generate_envs/multi_node_config.conf:

  • frontend2_private_ip

  • frontend2_node_hostname

  • frontend2_node_profile

After editing, your entries should look similar to the following:

JAVA
grep -e '^frontend' -e '^backend' generate_envs/multi_node_config.conf
JAVA
backend1_private_ip="X.X.X.X"
backend2_private_ip="X.X.X.X"
frontend1_private_ip="X.X.X.X"
frontend2_private_ip="X.X.X.X"
backend1_node_hostname="backend-1"
backend1_node_profile="PRIMARY_FULL_BACKEND"
backend2_node_hostname="backend-2"
backend2_node_profile="FULL_BACKEND"
frontend1_node_hostname="frontend-1"
frontend1_node_profile="PRIMARY_FRONTEND"
frontend2_node_hostname="frontend-2"
frontend2_node_profile="ADDITIONAL_FRONTEND"

Step 2: Generate Configuration Files for All Nodes

The following command will generate four configuration files, one for each node:

JAVA
./generate_envs/generate_envs.sh -t multi-node

Generated files will be:

  • ./generate_envs/generated_envs/env.backend-1

  • ./generate_envs/generated_envs/env.backend-2

  • ./generate_envs/generated_envs/env.frontend-1

  • ./generate_envs/generated_envs/env.frontend-2

The generate_envs.sh script will also copy the env.backend-1 file to <installer_directory>/.env.

Step 3: Setup Certificates

Run the following to generate all certificates and self-sign them using the <installer_directory>/certificates/generate_certs.sh script. Alternatively, you can use certificates generated by your own CA. For POCs and first-time installations, it is recommended to generate all certificates using generate_certs.sh.

JAVA
cd certificates/
./generate_certs.sh

The ./generate_certs.sh script will also create a file called kf-agilesec.internal-certs.tgz, which needs to be copied to all other nodes. This file conveniently contains the env.backend-1, env.backend-2, env.frontend-1, and env.frontend-2 files in addition to the certificates.

Step 4: Copy Files to All Other Nodes

Copy files to all other nodes from backend-1. For each node, copy the kf-agilesec.internal-certs.tgz file to <installer_directory>/certificates/.

JAVA
scp kf-agilesec.internal-certs.tgz $BE-2_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $FE-1_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $FE-2_IP:<installer_directory>/certificates/

Step 5: External FQDN Setup (Optional)

DNS-based load balancing for frontends can provide an external stable FQDN for both installation and post-install operations. Below are the steps to configure your DNS using AWS Route53 as your DNS provider. Other DNS providers supporting DNS-based load balancing can be configured similarly.

  1. Assuming you own <external domain>, create a public hosted zone for <external domain> in AWS Route53 and follow the steps below to configure the external FQDN in that zone.

  2. For each frontend IP address in your public hosted zone for <external domain>, create a DNS entry for <analytics_hostname> with "Record type = A" and "Routing policy = Multivalue Answer" with the value of your frontend IP address. You can set "Record ID" to any value, e.g., "fe-1" for frontend-1, "fe-2" for frontend-2.

  3. Validate your DNS configuration has propagated by using the following command. The number of entries should equal the number of your frontend nodes:

JAVA
dig +short . A

Step 6: Install on Backend-1 (BE-1)

  1. Ensure you have either a DNS entry for <analytics_hostname>.<external domain> (recommended for HA) pointing to frontend IPs or an entry in your /etc/hosts for <analytics_hostname>.<external domain> pointing to either frontend-1 or frontend-2’s IP address.

  2. Since all environment files were generated on BE-1, there should already be a .env file at <installer_directory>/.env on BE-1. Run the following to install the software on BE-1:

JAVA
cd 
sudo ./scripts/tune.sh -u <user>
./install_analytics.sh -u <user> -p <installation-directory>

Note: The <installation-dir> is the directory where your installed files will reside.

Step 7: Install on Backend-2 (BE-2)

  1. Ensure you have either a DNS entry for <analytics_hostname>.<external domain> (recommended for HA) pointing to frontend IPs or an entry in your /etc/hosts for <analytics_hostname>.<external domain> pointing to either frontend-1 or frontend-2’s IP address. If you have an upstream load balancer, it is better to use the IP of the load balancer.

  2. On BE-2, run the following to unarchive the files, copy the .env to the <installer_directory> root, and install the software:

JAVA
cd /certificates/
tar zxvf kf-agilesec.internal-certs.tgz

cp env.backend-2 ../.env

cd ../
sudo ./scripts/tune.sh -u <user>
./install_analytics.sh -u <user> -p <installation-directory>

Step 8: Install on Frontend-1 (FE-1)

If your DNS provider does not resolve the FQDN for FE-1, add the following entry to your /etc/hosts: $FE-1_IP agilesec.kf-agilesec.com.

Run the following to unarchive the files, copy the .env to the <installer_directory> root, and install the software:

JAVA
cd <installer_directory>/certificates/
tar zxvf kf-agilesec.internal-certs.tgz

cp env.frontend-1 ../.env

cd ../
sudo ./scripts/tune.sh -u <user>
./install_analytics.sh -u <user> -p <installation-dir> -v

At the end of installation, the installer will provide the following access details:

  • Access information for Web UI

  • Login URL

  • Admin username - Password (as provided during installation)

  • Ingestion service endpoint for v3 unified sensor

  • Ingestion endpoint for v2 sensors

Step 9: Install on FE-2

If your DNS provider does not resolve the FQDN for FE-2, add the following entry to your /etc/hosts: $FE-2_IP agilesec.kf-agilesec.com.

Run the following to unarchive the files, copy the .env to the <installer_directory> root, and install the software:

JAVA
cd <installer_directory>/certificates/

tar zxvf kf-agilesec.internal-certs.tgz
cp env.frontend-2 ../.env

cd ../
sudo ./scripts/tune.sh -u <user>
./install_analytics.sh -u <user> -p <installation-dir>

Adding a New Frontend to an Existing Cluster

Assumption: This section assumes you already have an existing working cluster with at least one frontend node. For this specific example, we assume you have a 4-node (BE-1, BE-2, FE-1, FE-2) working cluster set up in the previous section. This example can also be applied if you have a 3-node working cluster with BE-1, BE-2, and FE-1 and are adding FE-2.

To add a new frontend node called frontend-3 (FE-3), follow these steps:

Step 1: On BE-1, Add a New Frontend-3 Configuration Block

A. Ensure <installer_directory>/generate_envs/multi_node_config.conf has the following configurations added for FE-3. Add your private IP to the frontend3_private_ip field:

JAVA
frontend3_node_hostname="frontend-3"
frontend3_private_ip="X.X.X.X"
frontend3_node_profile="ADDITIONAL_FRONTEND"

B. After completing the above step, your frontend configurations should look like this:

JAVA
$ grep -e '^frontend' -e '^backend' generate_envs/multi_node_config.conf
JAVA
frontend1_node_hostname="frontend-1"
frontend1_private_ip="X.X.X.X"
frontend1_node_profile="PRIMARY_FRONTEND"
frontend2_node_hostname="frontend-2"
frontend2_private_ip="X.X.X.X"
frontend2_node_profile="ADDITIONAL_FRONTEND"
frontend3_node_hostname="frontend-3"
frontend3_private_ip="X.X.X.X"
frontend3_node_profile="ADDITIONAL_FRONTEND"

Note: The difference between PRIMARY_FRONTEND and ADDITIONAL_FRONTEND is PRIMARY_FRONTEND runs additional quorum coordinating services: MongoDB arbiter, OpenSearch cluster-manager, and kafka-controller service.

Step 2: Generate Configuration Files for Each Node

A. Run ./generate_envs/generate_envs.sh -t multi-node to regenerate the following files:

  • <installer_directory>/generate_envs/generated_envs/env.backend-2

  • <installer_directory>/generate_envs/generated_envs/env.backend-1

  • <installer_directory>/generate_envs/generated_envs/env.frontend-3

  • <installer_directory>/generate_envs/generated_envs/env.frontend-2

  • <installer_directory>/generate_envs/generated_envs/env.frontend-1

Step 3: From BE-1, Copy All Frontend Configuration Files

Copy all frontend configuration files to their respective frontend machines:

JAVA
scp <installer_directory>/generate_envs/generated_envs/env.frontend-1 \
  $FE-1_IP:<installer_directory>/.env

scp <installer_directory>/generate_envs/generated_envs/env.frontend-2 \
  $FE-2_IP:<installer_directory>/.env

scp <installer_directory>/generate_envs/generated_envs/env.frontend-3 \
  $FE-3_IP:<installer_directory>/.env

Copy the certificates bundle kf-agilesec.internal-certs.tgz to FE-3:

JAVA
scp <installer_directory>/certificates/kf-agilesec.internal-certs.tgz \
  $FE-3_IP:<installer_directory>/certificates

Step 4: Install FE-3

If your DNS provider does not resolve the FQDN for FE-3, add the following entry to your /etc/hosts: $FE-3_IP agilesec.kf-agilesec.com.

Run the following to install FE-3:

JAVA
cd <installer_directory>/certificates/
tar zxvf kf-agilesec.internal-certs.tgz

cd ../
sudo ./scripts/tune.sh -u <user>
./install_analytics.sh -u <user> -p <installation-dir>

Step 5: Patch the Existing Frontends (FE-1 and FE-2)

A. On FE-1

JAVA
cd <installer_directory>

./install_analytics.sh -u <user> -p <installation-dir> patch new-frontend -v

sudo ./scripts/tune.sh -u <user>

A. On FE-2

JAVA
cd <installer_directory>

./install_analytics.sh -u <user> -p <installation-dir> patch new-frontend -v

sudo ./scripts/tune.sh -u <user>

Adding an Upstream Load Balancer

An upstream load balancer distributes external traffic across your frontend nodes, providing a single stable entry point for the platform. You can use either a Layer 4 (TCP) or Layer 7 (Application/HTTPS) load balancer depending on your requirements.

Layer 4 (TCP)

Layer 7 (HTTPS)

SSL handling

Passthrough - LB forwards encrypted traffic as-is

Terminates TLS and optionally re-encrypts to backend

URL-based routing

No

Yes - can route by URL path

Example products

AWS NLB, Azure Load Balancer

AWS ALB, Azure Application Gateway

Note: The examples in this section use port 8443 or 443, but frontends can be configured to use different ports. Adjust the port numbers according to your configuration.

Layer 4 (TCP) Load Balancer

A Layer 4 load balancer operates at the TCP level, forwarding encrypted traffic as-is to the backend without inspecting or modifying it. No TLS certificate is needed on the load balancer itself. The backend's TLS certificate is presented directly to the client. This is the simpler option when you do not need URL-based routing or TLS inspection at the load balancer.

Generic Configuration Steps for AWS NLB

  1. Create a target group or backend pool with your frontend node(s) registered on the frontend port (e.g., 8443 or 443).

  2. Configure health checks - TCP port check (simple, confirms port is open) or HTTP/HTTPS check to /health-check (preferred, confirms HAProxy is responding).

  3. Create the L4 load balancer with a TCP listener on the frontend port.

  4. Forward the listener to the target group/backend pool.

  5. Lock down frontend firewall rules - only allow traffic from the LB's security group or subnet, so clients cannot bypass the LB and hit HAProxy directly.

  6. Point DNS to the LB's endpoint.

  7. Validate - log in and run a smoke test through the LB.

Key Considerations

  • No TLS certificate is needed on the LB. Traffic passes through encrypted as-is to the backend.

  • The backend's TLS certificate (self-signed or public CA) is presented directly to the client. The browser will see the certificate configured at HAProxy.

  • TCP health checks are simpler but less informative. They only confirm the port is open, not whether the application is responding correctly. Use HTTP/HTTPS health checks to /health-check for better reliability.

  • Ensure the OS-level firewall (firewalld on RHEL) is disabled or allows traffic on the frontend port.

Example: AWS Network Load Balancer with port 8443

The following steps demonstrate configuring an AWS NLB to forward TCP traffic to HAProxy on the frontend nodes. The same concepts apply to other cloud providers or on-premises Layer 4 TCP load balancers.

Prerequisites on the Frontend Nodes
  1. Confirm HAProxy is listening on 0.0.0.0:8443 (or the instance's private IP on 8443).

  2. Ensure each node is reachable on port 8443 from within the network.

  3. Decide how you want health checks to work:

    • Easiest: TCP health check on 8443 (checks if port is open)

    • Better: HTTP/HTTPS health check to an HAProxy endpoint (checks if HAProxy is working). HAProxy frontends can respond to the /health-check endpoint.

Step 1: Create an AWS Target Group for the Frontend Nodes

In EC2 Console → Target Groups → Create target group:

  1. Target type

    • Select Instances (typical for EC2) or IP (if you want to register IPs directly)

  2. Protocol / Port

    • Protocol: TCP

    • Port: (e.g., 8443 or 443)

  3. Health checks

    • Protocol: TCP (simple) or HTTP/HTTPS (preferred, since we have the /health-check URL)

    • Port: Traffic port (same as your frontend port)

  4. Create the target group, then Register targets:

    • Add your frontend instances (or IPs)

    • After registering, check the Targets tab → Health status to ensure they become healthy

Step 2: Create the Network Load Balancer (NLB)

In EC2 Console → Load Balancers → Create load balancer → Network Load Balancer:

  1. Scheme: Internet-facing (public) or Internal (private-only) based on your organizational policy and needs

  2. IP address type: IPv4

  3. Network mapping:

    • Select the VPC

    • Select the subnet in which your frontend VMs reside

  4. Optional: Create/choose an NLB security group to allow inbound TCP traffic on your frontend port from the sources you want (0.0.0.0/0 for public, or your corporate CIDRs, etc.)

Step 3: Add the Listener and Attach the Target Group

While creating the NLB (or afterward):

  1. Listener

    • Protocol: TCP

    • Port: (e.g., 8443 or 443)

  2. Default action

    • Forward to the target group you created in the previous step

Step 4: Lock Down the HAProxy Instances' Security Group (Important)

On the HAProxy instances' security group, ensure inbound rules allow:

  • TCP traffic on your frontend port (e.g., 8443 or 443) from the NLB security group (recommended), so clients cannot hit HAProxy directly

  • The health check port (same port if using traffic-port health checks)

Step 5: Validate
  1. Either point your external FQDN to the NLB (recommended) or update your /etc/hosts to point to the NLB IP address for local testing.

  2. Log in to https://<analytics_hostname>.:<your_frontend_port> and run a network scan as a smoke test. For smoke test execution details, see either the single-node or multi-node installation guide.

Layer 7 (Application / HTTPS) Load Balancer

A Layer 7 load balancer operates at the HTTP/HTTPS level, providing SSL termination, URL-based routing, HTTP-aware health probes, and optional WAF (Web Application Firewall) capabilities. This is recommended when you need TLS inspection, path-based routing, or advanced health checking.

When using a Layer 7 load balancer with AgileSec platform, you must decide how SSL/TLS is handled between the load balancer and the backend frontend nodes:

Option

How it works

Pros

Cons

End-to-end SSL (recommended)

LB terminates incoming TLS, re-encrypts traffic to backend on HTTPS

Backend stays on HTTPS, no application changes needed

Requires uploading the backend CA cert to the LB for trust

SSL offloading

LB terminates TLS, forwards plain HTTP to backend

Simpler LB configuration

Requires changing AgileSec backend to accept HTTP (not recommended)

Recommended: End-to-end SSL: The AgileSec backend continues to run on HTTPS, and the LB re-encrypts traffic to the backend. This requires the LB to trust the backend's TLS certificate by uploading the AgileSec root CA certificate (only needed with Self-signed / private CA cert).

Generic Configuration Steps

Regardless of the cloud provider or load balancer product, the following steps are required:

  1. Provision a public IP for the load balancer's frontend endpoint.

  2. Obtain a TLS certificate for the LB's public-facing listener. Use a public CA certificate for production, or a self-signed certificate for testing.

  3. Export the AgileSec root CA certificate (for self-signed/private cert only) from the backend. This is located at <installation-dir>/certificates/ca/agilesec-rootca-cert.pem and is needed so the LB can trust the backend's self-signed TLS certificate during the HTTPS handshake.

  4. Create the Layer 7 load balancer with:

    • A frontend HTTPS listener on port 443 with the frontend TLS certificate

    • A backend pool pointing to the frontend node(s) private IP(s) on the configured frontend port (e.g., 443 or 8443)

    • Backend protocol set to HTTPS (for end-to-end SSL)

  5. Upload the AgileSec root CA as a trusted backend root certificate on the LB and associate it with the backend HTTP settings.

  6. Configure an HTTPS health probe to a known endpoint (e.g., /signin). Important: The probe's Host header must match the backend certificate's CN (typically <analytics_hostname>.<analytics_domain>), not the backend's IP address. A mismatch causes the SSL handshake to fail and the backend to appear unhealthy.

  7. Update firewall rules to allow traffic from the LB's subnet/security group to the frontend nodes on the frontend port.

  8. Open the frontend port in the OS-level firewall on each frontend node. On RHEL, disable firewalld or allow traffic for 443/tcp:

  9. Point DNS to the load balancer's public IP address.

  10. Validate access to the platform URL, log in, and run a smoke test.

Key Considerations

  • 502 Bad Gateway: This typically means the health probe is failing. Check the backend health status on the LB dashboard or via CLI. The most common cause is a certificate CN mismatch (see step 6 above).

  • Certificate CN mismatch: The LB connects to backends using their private IP address, but the backend TLS certificate has a hostname-based CN (e.g., agilesec.kf-agilesec.com). The health probe and backend HTTP settings must send the correct Host header matching the certificate CN. It should not be the IP address.

  • Two-layer firewall: Both the cloud-level firewall (security groups, NSGs) AND the OS-level firewall (firewalld on RHEL) must allow traffic on the frontend port.

  • Dedicated subnet: Some cloud L7 load balancers (e.g., Azure Application Gateway) require their own dedicated subnet that cannot be shared with other resources.

  • Health probe endpoint: Use /signin (returns HTTP 200-399) or /health-check available on HAProxy.

Example: Azure Application Gateway with port 443

The following example concretely demonstrates the test of setting up an Azure Application Gateway (Standard_v2) with end-to-end SSL (self-signed). The same concepts apply to AWS ALB, GCP HTTPS Load Balancer, or on-premises solutions like NGINX and F5.

Variables

Set the following variables based on your Azure environment:

BASH
RESOURCE_GROUP="<resource-group>"
LOCATION="<azure-region>"
VNET_NAME="<vnet-name>"
SUBNET_APPGW="<dedicated-appgw-subnet>"
NSG_NAME="<nsg-name>"
APPGW_NAME="<appgw-name>"
VM_PRIVATE_IP="<frontend-node-private-ip>"
BACKEND_CERT_CN="<analytics_hostname>.<analytics_domain>"  # e.g., agilesec.kf-agilesec.com
Step 1: Create a public IP for the Azure Application Gateway
BASH
az network public-ip create \
  --resource-group $RESOURCE_GROUP \
  --name ${APPGW_NAME}-pip \
  --sku Standard \
  --allocation-method Static

APPGW_PIP=$(az network public-ip show --resource-group $RESOURCE_GROUP --name ${APPGW_NAME}-pip --query ipAddress -o tsv)
Step 2: Create a frontend TLS certificate (PFX)

Azure Application Gateway requires the frontend listener certificate in PFX (PKCS#12) format.

BASH
# Self-signed for testing (use a public CA cert for production):
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout appgw.key -out appgw.crt \
  -subj "/CN=$BACKEND_CERT_CN"
openssl pkcs12 -export -out appgw.pfx -inkey appgw.key -in appgw.crt -password pass:AppGwPass123
Step 3: Export the AgileSec root CA for backend trust
BASH
scp <user>@<frontend-node>:<installation-dir>/certificates/ca/agilesec-rootca-cert.pem ./backend-trusted-root.cer
Step 4: Create the Application Gateway
BASH
az network application-gateway create \
  --resource-group $RESOURCE_GROUP \
  --name $APPGW_NAME \
  --location $LOCATION \
  --vnet-name $VNET_NAME \
  --subnet $SUBNET_APPGW \
  --sku Standard_v2 \
  --capacity 1 \
  --public-ip-address ${APPGW_NAME}-pip \
  --http-settings-port 443 \
  --http-settings-protocol Https \
  --frontend-port 443 \
  --servers $VM_PRIVATE_IP \
  --cert-file appgw.pfx \
  --cert-password "AppGwPass123" \
  --priority 100

Note: Application Gateway provisioning can take 10-20 minutes. Also, The --servers flag accepts multiple space-separated IPs.

Step 5: Configure the backend trusted root certificate
BASH
# This uploads the AgileSec root CA certificate to the Application Gateway as a trusted root certificate. This tells the App GW "trust any backend certificate signed by this CA" which is needed for the end-to-end SSL handshake when the App GW connects to the backend over HTTPS. Without it, the App GW would reject the backend's self-signed/private CA certificate and return 502.
az network application-gateway root-cert create \
  --resource-group $RESOURCE_GROUP \
  --gateway-name $APPGW_NAME \
  --name agilesec-backend-ca \
  --cert-file backend-trusted-root.cer

# Update HTTP settings to use the trusted root cert
az network application-gateway http-settings update \
  --resource-group $RESOURCE_GROUP \
  --gateway-name $APPGW_NAME \
  --name appGatewayBackendHttpSettings \
  --protocol Https \
  --port 443 \
  --host-name-from-backend-pool false \
  --root-certs agilesec-backend-ca
Step 6: Configure the HTTPS health probe

The --host value must match the backend certificate's CN to avoid certificate mismatch errors.

BASH
az network application-gateway probe create \
  --resource-group $RESOURCE_GROUP \
  --gateway-name $APPGW_NAME \
  --name agilesec-health-probe \
  --protocol Https \
  --host $BACKEND_CERT_CN \
  --path "/signin" \
  --interval 30 \
  --timeout 30 \
  --threshold 3 \
  --match-status-codes "200-399"

# Associate probe with backend HTTP settings
az network application-gateway http-settings update \
  --resource-group $RESOURCE_GROUP \
  --gateway-name $APPGW_NAME \
  --name appGatewayBackendHttpSettings \
  --probe agilesec-health-probe
Step 7: Update firewall rules
BASH
# Get the App Gateway subnet CIDR
APPGW_SUBNET_CIDR=$(az network vnet subnet show \
  --resource-group $RESOURCE_GROUP \
  --vnet-name $VNET_NAME \
  --name $SUBNET_APPGW \
  --query addressPrefix -o tsv)

# Allow App GW subnet to reach frontend nodes on port 443
az network nsg rule create \
  --resource-group $RESOURCE_GROUP \
  --nsg-name $NSG_NAME \
  --name AllowAppGW \
  --priority 120 \
  --source-address-prefixes $APPGW_SUBNET_CIDR \
  --destination-port-ranges 443 \
  --access Allow --protocol Tcp --direction Inbound

# Allow Azure App GW infrastructure management traffic (required)
az network nsg rule create \
  --resource-group $RESOURCE_GROUP \
  --nsg-name $NSG_NAME \
  --name AllowAppGWInfra \
  --priority 130 \
  --source-address-prefixes GatewayManager \
  --destination-port-ranges 65200-65535 \
  --access Allow --protocol Tcp --direction Inbound
Step 8: Open port 443 in the OS firewall (if firewalld is enabled)

On each frontend node:

BASH
sudo firewall-cmd --add-port=443/tcp --permanent
sudo firewall-cmd --reload
Step 9: Validate

Check backend health:

BASH
az network application-gateway show-backend-health \
  --resource-group $RESOURCE_GROUP \
  --name $APPGW_NAME \
  --query "backendAddressPools[0].backendHttpSettingsCollection[0].servers[0]" -o json

The health field should show Healthy. Then:

  1. Point your DNS A record to the Application Gateway public IP ($APPGW_PIP).

  2. Access https://<analytics_hostname>.<analytics_domain> in a browser and verify the login page loads.

  3. Log in and run a network scan as a smoke test. For smoke test details, see the single-node or multi-node installation guide.

  4. Test sensor connectivity:

    BASH
    curl -k https://<analytics_hostname>.<analytics_domain>/v1/oauth2/token
    # Should return 405 Method Not Allowed (needs POST) — confirms ingestion routing works

Adding a New Scan Node to an Existing Cluster

Scan nodes are asynchronous stateless worker nodes subscribing to Kafka topics to get scan requests, execute the scan, and publish data back to Kafka. Scan nodes only run HAProxy and Scheduler services.

If you want to decouple or distribute scan operations on separate nodes, you can provision one or more scan nodes and run the following installation steps on each scan node. Scan nodes use the env.backend-1 file for configuration.

Step 1: From Backend-1, Copy Certificates and Configuration to Scan Node

JAVA
scp <installer_directory>/certificates/kf-agilesec.internal-certs.tgz \
  $SN-1_IP:<installer_directory>/certificates

Step 2: Install Scan Node

JAVA
cd <installer-root>/certificates/
tar zxvf kf-agilesec.internal-certs.tgz

cp env.backend-1 ../.env

cd ../
sudo ./scripts/tune.sh -u ec2-user -r scan
./install_analytics.sh -u ec2-user -p <installation-dir> -r scan

Note: Both tune.sh and install_analytics.sh require the special flag -r scan for scan node installations.

Once you have one or more scan nodes, you have the option to permanently disable the Scheduler service on backend-1 and backend-2.

Multi-Region Stretch Cluster with Coordinator Node

When deploying across multiple regions or availability zones(AZ) or data centers(DC), you need an odd number of quorum-participating nodes to maintain majority during an AZ/DC failure. The COORDINATOR profile provides a quorum tiebreaker node that runs only cluster management services without the overhead of a full backend or frontend.

In a standard topology, the PRIMARY_FRONTEND runs quorum services (MongoDB Arbiter, Kafka Controller, OpenSearch Cluster Manager) alongside the user-facing services (API, WebUI, Dashboards). The COORDINATOR profile splits these quorum services onto a dedicated node, allowing you to place the tiebreaker in a separate AZ/DC while keeping all frontends as ADDITIONAL_FRONTEND.

When to Use a Coordinator Node

  • Stretch clusters spanning 2+ regions or availability zones or data-centers where you need a quorum tiebreaker in a third location

  • When you want all frontend nodes to be identical (ADDITIONAL_FRONTEND) without one special PRIMARY_FRONTEND

  • When the quorum tiebreaker AZ/DC doesn't need to serve user traffic (no WebUI, API, or Dashboards needed there)

Coordinator Node Services

A coordinator node runs only quorum-participating services:

Service

Description

MongoDB Arbiter

Participates in replica set elections. Does not store data.

Kafka Controller

KRaft quorum voter (port 9094 only). Does not run a broker (no port 9092/9093).

OpenSearch Cluster Manager

Participates in cluster manager elections. Does not store data.

Example Topology: 7-Node Stretch Cluster

The following example shows a 7-node stretch cluster across 2 AWS regions and 3 availability zones:

Node

Region / AZ

Role

backend-1

us-west-1a

Primary full backend

backend-2

us-west-1a

Full backend

frontend-2

us-west-1a

Additional frontend

backend-3

us-west-2b

Full backend

backend-4

us-west-2b

Full backend

coordinator-1

us-west-2a

Coordinator (quorum tiebreaker)

frontend-3

us-west-2b

Additional frontend

63913c7195fc861dafdffb48a3b657180471428b458c5014dfffb8b819cfbd83

An Availability Zone (AZ) is equivalent to a datacenter or an independent failure domain within a region.

In this topology:

  • 5 quorum voters: 4 backends + 1 coordinator (odd number for majority)

  • 2 frontends serving user traffic (both ADDITIONAL_FRONTEND)

  • Coordinator in us-west-2a acts as tiebreaker between the two regions

Inter-Datacenter Latency Requirements

A stretch cluster requires low-latency, stable network links between datacenters. Kafka, MongoDB, and OpenSearch perform synchronous replication and quorum coordination across nodes, so inter-datacenter latency directly impacts write performance, quorum elections, and pipeline throughput.

Inter-Datacenter RTT

Minimum Throughput

Suitability

Notes

< 5ms

1 Gbps+

Ideal

Same metro area or campus network. All services operate at full performance.

5-20ms

1 Gbps+

Recommended

Nearby datacenters or neighboring regions. Tested and recommended range for stretch clusters.

20-50ms

500 Mbps+

Degraded

Write latency increases. Kafka consumer group rebalancing becomes slower. Scans might work but pipeline latency increases.

> 50ms

Any

Not recommended

Quorum coordination becomes fragile. Consumer groups may fail to stabilize.

  • Zero packet loss and low jitter are as important as low latency.

  • These requirements apply to inter-datacenter links only. Intra-datacenter communication is assumed to be sub-millisecond with multi-Gbps throughput.

  • Based on stretch cluster testing with 18ms inter-datacenter latency (us-west-1 to us-west-2), all services including Kafka, MongoDB, and OpenSearch operated within acceptable parameters.

Configuration

Edit generate_envs/multi_node_config.conf on backend-1:

  1. Comment out the frontend-1 entries (coordinator replaces its quorum role):

BASH
#frontend1_private_ip=""
#frontend1_node_hostname=""
#frontend1_node_profile=""
  1. Set the coordinator IP (uncomment and populate):

BASH
coordinator1_private_ip="<coordinator-ip>"
  1. Append the coordinator node configuration block:

BASH
cat >> generate_envs/multi_node_config.conf <<'EOF'
coordinator1_node_hostname="coordinator-1"
coordinator1_node_profile="COORDINATOR"
EOF
  1. Set additional backend and frontend IPs as needed for your topology (e.g., backend3_private_ip, backend4_private_ip, frontend2_private_ip, frontend3_private_ip).

  1. Set OpenSearch index replication for AZ failover tolerance:

BASH
# In generate_envs/multi_node_config.conf, set:
opensearch_index_number_of_shards=4
opensearch_index_number_of_replicas=3

With 4 data nodes across 2 AZs, setting number_of_replicas=3 ensures every index has copies on all 4 data nodes. This means a full AZ failure (losing 2 data nodes) still leaves every index with at least 1 surviving replica, keeping the cluster YELLOW (data accessible) instead of RED (data loss).

  1. Generate environment files on backend-1:

BASH
./generate_envs/generate_envs.sh -t multi-node

This will generate env files for all nodes including env.coordinator-1. No env.frontend-1 will be generated since it was commented out.

  1. Generate certificates on backend-1 (same as Step 3: Setup Certificates in the four-node section):

BASH
cd <installer_directory>/certificates/
./generate_certs.sh

This creates all certificates and packages them into kf-agilesec.internal-certs.tgz.

  1. System tuning and password setup on backend-1:

BASH
cd <installer_directory>
sudo ./scripts/tune.sh -u <user>
cp .pass.example .pass
# Edit .pass and set admin_password
chmod 600 .pass
  1. Copy cert tarball from backend-1 to all other nodes:

BASH
cd <installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $BE2_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $BE3_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $BE4_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $COORD1_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $FE2_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $FE3_IP:<installer_directory>/certificates/

10. On each other node, extract certs, set the correct env file, run tune.sh, and set the admin password. The process is the same as Steps 7-9 in the four-node section — only the env file name differs per node:

Node

Env file to copy

backend-2

cp env.backend-2 ../.env

backend-3

cp env.backend-3 ../.env

backend-4

cp env.backend-4 ../.env

coordinator-1

cp env.coordinator-1 ../.env

frontend-2

cp env.frontend-2 ../.env

frontend-3

cp env.frontend-3 ../.env

On each node:

BASH
cd <installer_directory>/certificates/
tar zxvf kf-agilesec.internal-certs.tgz
cp <env-file> ../.env
cd ..
sudo ./scripts/tune.sh -u <user>
cp .pass.example .pass
# Edit .pass and set admin_password
chmod 600 .pass

Installation Order

IMPORTANT:

Kafka quorum during installation: When multiple backend nodes are installed in parallel, the Kafka KRaft quorum may go through re-elections as new nodes join. If Kafka stops running on any backend node during installation, manually restart it with ./scripts/manage.sh start kafka to help the quorum form and allow the installation to make progress.

Health check cron: As installation of each node completes, verify the health check cron job is active on all nodes (crontab -l). This cron automatically detects and restarts stopped services, which helps maintain Kafka quorum stability during normal operation.

  1. Install backend-1 (sequential, must complete first - initializes MongoDB primary, OpenSearch cluster manager, Kafka leader):

BASH
./install_analytics.sh install -u <user> -p <installation-dir> --non-interactive
  1. Install backend-2 (sequential, after backend-1 - adds MongoDB secondary):

BASH
./install_analytics.sh install -u <user> -p <installation-dir> --non-interactive
  1. Install backend-3, backend-4, coordinator-1 in parallel (after backend-2 completes). The coordinator creates Kafka topics during this phase:

BASH
./install_analytics.sh install -u <user> -p <installation-dir> --non-interactive

Wait for all 3 to complete before proceeding.

  1. Install frontend-2, frontend-3 in parallel (after step 3 completes). Frontends depend on Kafka topics created by the coordinator:

BASH
./install_analytics.sh install -u <user> -p <installation-dir> --non-interactive

Each node must have its correct .env file set, certificates extracted, and system tuning applied before running the install command.

Post-Installation Verification

On the coordinator node, you should see 3 services:

CODE
$ ./manage.sh status
SERVICE                   DESCRIPTION                              STATUS
------------------------  ---------------------------------------- -----------
mongodb                   MongoDB Server                           Running
opensearch                OpenSearch Search Engine                 Running
kafka                     Kafka Server                             Running

On frontend nodes (ADDITIONAL_FRONTEND), you should see 6 services:

CODE
$ ./manage.sh status
SERVICE                   DESCRIPTION                              STATUS
------------------------  ---------------------------------------- -----------
sm                        Security Manager Microservice            Running
api                       Web API Microservice                     Running
webui                     Web UI Microservice                      Running
opensearch-dashboards     OpenSearch Dashboards                    Running
cbom                      CBOM Exporter Microservice               Running
haproxy                   HAProxy Load Balancer                    Running

Backend nodes should show the standard 9 services (mongodb, opensearch, kafka, scheduler, analytics-manager, ingestion, indexing, sm, haproxy).

Quorum Considerations

The coordinator node's primary purpose is to maintain an odd quorum voter count for Kafka KRaft, MongoDB replica set, and OpenSearch cluster manager elections.

Component

Quorum Voters

Majority Needed

Kafka KRaft

5 (4 backends + 1 coordinator)

3

MongoDB

5 (4 backends + 1 coordinator arbiter)

3

OpenSearch

5 (4 backends + 1 coordinator)

3

Placement guidance:

  • Place the coordinator in a different AZ from the majority of your backend nodes. This ensures that if one AZ fails, the surviving nodes (in the other AZ + coordinator) retain quorum majority.

  • In the example topology above, if us-west-1a fails (losing backend-1, backend-2, frontend-2), the surviving nodes in us-west-2 (backend-3, backend-4, coordinator-1, frontend-3) retain 3 of 5 quorum voters.

Testing Your HA Setup

Testing Frontends

Stop one of the frontends and run a network scan through the UI. The scan should complete successfully, confirming that the remaining frontend(s) can handle all traffic.

Testing Backends

Stop one of the backends and run a network scan through the UI. The scan should complete successfully, confirming the cluster remains operational with a single backend node failure.

Note: For a truly highly available cluster, ensure you can lose any single node (frontend or backend) without service interruption.

Testing Datacenter Failover (Stretch Cluster)

This test validates that the cluster survives a full datacenter/AZ failure. It applies to stretch cluster topologies with a coordinator node.

Steps:

  1. Disable health check cron on all nodes (crontab -l | sed 's|^\(.* .*health_check\.sh\)$|#DISABLED# \1|' | crontab -).

  2. Stop all services on all nodes in one datacenter (e.g., backend-1, backend-2, frontend-2 in DC-1).

  3. Run a network scan and a GitHub scann

  4. Both scans should complete successfully.

IMPORTANT: This test requires opensearch_index_number_of_replicas=3 and Kafka topics with replication factor matching the number of backend nodes (set during installation). Without these, some indices or partitions may become unavailable during a full AZ failure.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.