On-Prem High Availability (HA) Guide
This guide describes how to configure the AgileSec Platform for high availability in on-premises environments.
Overview
This guide provides instructions for the following scenarios:
Setting up a highly available (no single point of failure) AgileSec platform cluster for a new installation.
Adding capacity to an existing cluster to eliminate single points of failure and achieve high availability.
Note: HA in this context means the cluster remains operational if one node fails at any given time. In case of multi region/availabilty-zone or multi-dataceter deployments called Stretch Cluster, HA also means that the cluster remains operational if one data-center fails.
Prerequisites
Ensure you have the installer package and your nodes meet the minimum memory, CPU, and disk requirements.
Requirements | Backend (Minimum) | Backend (Production) | Scan-node | Frontend | Additional Frontends | Coordinator |
|---|---|---|---|---|---|---|
CPU cores | 4 | 8 | 2 | 4 | 2 | 2 |
Memory | 32GB | 64GB - small scan volume 128GB - large scan volume | 8GB | 16GB | 16GB | 16GB |
Disk space | 50GB | 100GB - small scan volume 200GB+ - large scan volume | 50GB | 50GB | 50GB | 50GB |
For convenient file transfers between nodes, it is recommended to set up SSH key-based password-less authentication from backend-1 to all other nodes, though manual file transfer methods can also be used. Certificates and configuration files will be generated on backend-1 and copied to other nodes. Use consistent file names and directory structures when copying files between machines.
Familiarity with the basic 3-node (two backends, one frontend) On-Prem installation guide is recommended. The section Adding a new Frontend to an existing cluster assumes you have a running 3-node cluster.
Four-Node Installation for New Cluster (2 Backends, 2 Frontends)
Note: It is recommended to build the cluster nodes in the following order.
Step 1: Setup Cluster Configuration on Backend-1 (be-1)
Configure the cluster topology by editing the
generate_envs/multi_node_config.conffile with your environment-specific details. Add the private IPs of all your nodes (be-1, be-2, fe-1, fe-2).
cd <installer_directory>
vi generate_envs/multi_node_config.conf
Add the private IPs of your two backend nodes and two frontend nodes.
Uncomment the following frontend-2 entries in
generate_envs/multi_node_config.conf:
frontend2_private_ipfrontend2_node_hostnamefrontend2_node_profile
After editing, your entries should look similar to the following:
grep -e '^frontend' -e '^backend' generate_envs/multi_node_config.conf
backend1_private_ip="X.X.X.X"
backend2_private_ip="X.X.X.X"
frontend1_private_ip="X.X.X.X"
frontend2_private_ip="X.X.X.X"
backend1_node_hostname="backend-1"
backend1_node_profile="PRIMARY_FULL_BACKEND"
backend2_node_hostname="backend-2"
backend2_node_profile="FULL_BACKEND"
frontend1_node_hostname="frontend-1"
frontend1_node_profile="PRIMARY_FRONTEND"
frontend2_node_hostname="frontend-2"
frontend2_node_profile="ADDITIONAL_FRONTEND"
Step 2: Generate Configuration Files for All Nodes
The following command will generate four configuration files, one for each node:
./generate_envs/generate_envs.sh -t multi-node
Generated files will be:
./generate_envs/generated_envs/env.backend-1./generate_envs/generated_envs/env.backend-2./generate_envs/generated_envs/env.frontend-1./generate_envs/generated_envs/env.frontend-2
The generate_envs.sh script will also copy the env.backend-1 file to <installer_directory>/.env.
Step 3: Setup Certificates
Run the following to generate all certificates and self-sign them using the <installer_directory>/certificates/generate_certs.sh script. Alternatively, you can use certificates generated by your own CA. For POCs and first-time installations, it is recommended to generate all certificates using generate_certs.sh.
cd certificates/
./generate_certs.sh
The ./generate_certs.sh script will also create a file called kf-agilesec.internal-certs.tgz, which needs to be copied to all other nodes. This file conveniently contains the env.backend-1, env.backend-2, env.frontend-1, and env.frontend-2 files in addition to the certificates.
Step 4: Copy Files to All Other Nodes
Copy files to all other nodes from backend-1. For each node, copy the kf-agilesec.internal-certs.tgz file to <installer_directory>/certificates/.
scp kf-agilesec.internal-certs.tgz $BE-2_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $FE-1_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $FE-2_IP:<installer_directory>/certificates/
Step 5: External FQDN Setup (Optional)
DNS-based load balancing for frontends can provide an external stable FQDN for both installation and post-install operations. Below are the steps to configure your DNS using AWS Route53 as your DNS provider. Other DNS providers supporting DNS-based load balancing can be configured similarly.
Assuming you own
<external domain>, create a public hosted zone for<external domain>in AWS Route53 and follow the steps below to configure the external FQDN in that zone.For each frontend IP address in your public hosted zone for
<external domain>, create a DNS entry for<analytics_hostname>with "Record type = A" and "Routing policy = Multivalue Answer" with the value of your frontend IP address. You can set "Record ID" to any value, e.g., "fe-1" for frontend-1, "fe-2" for frontend-2.Validate your DNS configuration has propagated by using the following command. The number of entries should equal the number of your frontend nodes:
dig +short . A
Step 6: Install on Backend-1 (BE-1)
Ensure you have either a DNS entry for
<analytics_hostname>.<external domain>(recommended for HA) pointing to frontend IPs or an entry in your/etc/hostsfor<analytics_hostname>.<external domain>pointing to either frontend-1 or frontend-2’s IP address.Since all environment files were generated on BE-1, there should already be a .env file at
<installer_directory>/.envon BE-1. Run the following to install the software on BE-1:
cd
sudo ./scripts/tune.sh -u <user>
./install_analytics.sh -u <user> -p <installation-directory>
Note: The <installation-dir> is the directory where your installed files will reside.
Step 7: Install on Backend-2 (BE-2)
Ensure you have either a DNS entry for
<analytics_hostname>.<external domain>(recommended for HA) pointing to frontend IPs or an entry in your/etc/hostsfor<analytics_hostname>.<external domain>pointing to either frontend-1 or frontend-2’s IP address. If you have an upstream load balancer, it is better to use the IP of the load balancer.On BE-2, run the following to unarchive the files, copy the .env to the
<installer_directory>root, and install the software:
cd /certificates/
tar zxvf kf-agilesec.internal-certs.tgz
cp env.backend-2 ../.env
cd ../
sudo ./scripts/tune.sh -u <user>
./install_analytics.sh -u <user> -p <installation-directory>
Step 8: Install on Frontend-1 (FE-1)
If your DNS provider does not resolve the FQDN for FE-1, add the following entry to your /etc/hosts: $FE-1_IP agilesec.kf-agilesec.com.
Run the following to unarchive the files, copy the .env to the <installer_directory> root, and install the software:
cd <installer_directory>/certificates/
tar zxvf kf-agilesec.internal-certs.tgz
cp env.frontend-1 ../.env
cd ../
sudo ./scripts/tune.sh -u <user>
./install_analytics.sh -u <user> -p <installation-dir> -v
At the end of installation, the installer will provide the following access details:
Access information for Web UI
Login URL
Admin username - Password (as provided during installation)
Ingestion service endpoint for v3 unified sensor
Ingestion endpoint for v2 sensors
Step 9: Install on FE-2
If your DNS provider does not resolve the FQDN for FE-2, add the following entry to your /etc/hosts: $FE-2_IP agilesec.kf-agilesec.com.
Run the following to unarchive the files, copy the .env to the <installer_directory> root, and install the software:
cd <installer_directory>/certificates/
tar zxvf kf-agilesec.internal-certs.tgz
cp env.frontend-2 ../.env
cd ../
sudo ./scripts/tune.sh -u <user>
./install_analytics.sh -u <user> -p <installation-dir>
Adding a New Frontend to an Existing Cluster
Assumption: This section assumes you already have an existing working cluster with at least one frontend node. For this specific example, we assume you have a 4-node (BE-1, BE-2, FE-1, FE-2) working cluster set up in the previous section. This example can also be applied if you have a 3-node working cluster with BE-1, BE-2, and FE-1 and are adding FE-2.
To add a new frontend node called frontend-3 (FE-3), follow these steps:
Step 1: On BE-1, Add a New Frontend-3 Configuration Block
A. Ensure <installer_directory>/generate_envs/multi_node_config.conf has the following configurations added for FE-3. Add your private IP to the frontend3_private_ip field:
frontend3_node_hostname="frontend-3"
frontend3_private_ip="X.X.X.X"
frontend3_node_profile="ADDITIONAL_FRONTEND"
B. After completing the above step, your frontend configurations should look like this:
$ grep -e '^frontend' -e '^backend' generate_envs/multi_node_config.conf
frontend1_node_hostname="frontend-1"
frontend1_private_ip="X.X.X.X"
frontend1_node_profile="PRIMARY_FRONTEND"
frontend2_node_hostname="frontend-2"
frontend2_private_ip="X.X.X.X"
frontend2_node_profile="ADDITIONAL_FRONTEND"
frontend3_node_hostname="frontend-3"
frontend3_private_ip="X.X.X.X"
frontend3_node_profile="ADDITIONAL_FRONTEND"
Note: The difference between PRIMARY_FRONTEND and ADDITIONAL_FRONTEND is PRIMARY_FRONTEND runs additional quorum coordinating services: MongoDB arbiter, OpenSearch cluster-manager, and kafka-controller service.
Step 2: Generate Configuration Files for Each Node
A. Run ./generate_envs/generate_envs.sh -t multi-node to regenerate the following files:
<installer_directory>/generate_envs/generated_envs/env.backend-2<installer_directory>/generate_envs/generated_envs/env.backend-1<installer_directory>/generate_envs/generated_envs/env.frontend-3<installer_directory>/generate_envs/generated_envs/env.frontend-2<installer_directory>/generate_envs/generated_envs/env.frontend-1
Step 3: From BE-1, Copy All Frontend Configuration Files
Copy all frontend configuration files to their respective frontend machines:
scp <installer_directory>/generate_envs/generated_envs/env.frontend-1 \
$FE-1_IP:<installer_directory>/.env
scp <installer_directory>/generate_envs/generated_envs/env.frontend-2 \
$FE-2_IP:<installer_directory>/.env
scp <installer_directory>/generate_envs/generated_envs/env.frontend-3 \
$FE-3_IP:<installer_directory>/.env
Copy the certificates bundle kf-agilesec.internal-certs.tgz to FE-3:
scp <installer_directory>/certificates/kf-agilesec.internal-certs.tgz \
$FE-3_IP:<installer_directory>/certificates
Step 4: Install FE-3
If your DNS provider does not resolve the FQDN for FE-3, add the following entry to your /etc/hosts: $FE-3_IP agilesec.kf-agilesec.com.
Run the following to install FE-3:
cd <installer_directory>/certificates/
tar zxvf kf-agilesec.internal-certs.tgz
cd ../
sudo ./scripts/tune.sh -u <user>
./install_analytics.sh -u <user> -p <installation-dir>
Step 5: Patch the Existing Frontends (FE-1 and FE-2)
A. On FE-1
cd <installer_directory>
./install_analytics.sh -u <user> -p <installation-dir> patch new-frontend -v
sudo ./scripts/tune.sh -u <user>
A. On FE-2
cd <installer_directory>
./install_analytics.sh -u <user> -p <installation-dir> patch new-frontend -v
sudo ./scripts/tune.sh -u <user>
Adding an Upstream Load Balancer
An upstream load balancer distributes external traffic across your frontend nodes, providing a single stable entry point for the platform. You can use either a Layer 4 (TCP) or Layer 7 (Application/HTTPS) load balancer depending on your requirements.
Layer 4 (TCP) | Layer 7 (HTTPS) | |
|---|---|---|
SSL handling | Passthrough - LB forwards encrypted traffic as-is | Terminates TLS and optionally re-encrypts to backend |
URL-based routing | No | Yes - can route by URL path |
Example products | AWS NLB, Azure Load Balancer | AWS ALB, Azure Application Gateway |
Note: The examples in this section use port 8443 or 443, but frontends can be configured to use different ports. Adjust the port numbers according to your configuration.
Layer 4 (TCP) Load Balancer
A Layer 4 load balancer operates at the TCP level, forwarding encrypted traffic as-is to the backend without inspecting or modifying it. No TLS certificate is needed on the load balancer itself. The backend's TLS certificate is presented directly to the client. This is the simpler option when you do not need URL-based routing or TLS inspection at the load balancer.
Generic Configuration Steps for AWS NLB
Create a target group or backend pool with your frontend node(s) registered on the frontend port (e.g., 8443 or 443).
Configure health checks - TCP port check (simple, confirms port is open) or HTTP/HTTPS check to
/health-check(preferred, confirms HAProxy is responding).Create the L4 load balancer with a TCP listener on the frontend port.
Forward the listener to the target group/backend pool.
Lock down frontend firewall rules - only allow traffic from the LB's security group or subnet, so clients cannot bypass the LB and hit HAProxy directly.
Point DNS to the LB's endpoint.
Validate - log in and run a smoke test through the LB.
Key Considerations
No TLS certificate is needed on the LB. Traffic passes through encrypted as-is to the backend.
The backend's TLS certificate (self-signed or public CA) is presented directly to the client. The browser will see the certificate configured at HAProxy.
TCP health checks are simpler but less informative. They only confirm the port is open, not whether the application is responding correctly. Use HTTP/HTTPS health checks to
/health-checkfor better reliability.Ensure the OS-level firewall (
firewalldon RHEL) is disabled or allows traffic on the frontend port.
Example: AWS Network Load Balancer with port 8443
The following steps demonstrate configuring an AWS NLB to forward TCP traffic to HAProxy on the frontend nodes. The same concepts apply to other cloud providers or on-premises Layer 4 TCP load balancers.
Prerequisites on the Frontend Nodes
Confirm HAProxy is listening on 0.0.0.0:8443 (or the instance's private IP on 8443).
Ensure each node is reachable on port 8443 from within the network.
Decide how you want health checks to work:
Easiest: TCP health check on 8443 (checks if port is open)
Better: HTTP/HTTPS health check to an HAProxy endpoint (checks if HAProxy is working). HAProxy frontends can respond to the
/health-checkendpoint.
Step 1: Create an AWS Target Group for the Frontend Nodes
In EC2 Console → Target Groups → Create target group:
Target type
Select Instances (typical for EC2) or IP (if you want to register IPs directly)
Protocol / Port
Protocol: TCP
Port: (e.g., 8443 or 443)
Health checks
Protocol: TCP (simple) or HTTP/HTTPS (preferred, since we have the
/health-checkURL)Port: Traffic port (same as your frontend port)
Create the target group, then Register targets:
Add your frontend instances (or IPs)
After registering, check the Targets tab → Health status to ensure they become healthy
Step 2: Create the Network Load Balancer (NLB)
In EC2 Console → Load Balancers → Create load balancer → Network Load Balancer:
Scheme: Internet-facing (public) or Internal (private-only) based on your organizational policy and needs
IP address type: IPv4
Network mapping:
Select the VPC
Select the subnet in which your frontend VMs reside
Optional: Create/choose an NLB security group to allow inbound TCP traffic on your frontend port from the sources you want (0.0.0.0/0 for public, or your corporate CIDRs, etc.)
Step 3: Add the Listener and Attach the Target Group
While creating the NLB (or afterward):
Listener
Protocol: TCP
Port: (e.g., 8443 or 443)
Default action
Forward to the target group you created in the previous step
Step 4: Lock Down the HAProxy Instances' Security Group (Important)
On the HAProxy instances' security group, ensure inbound rules allow:
TCP traffic on your frontend port (e.g., 8443 or 443) from the NLB security group (recommended), so clients cannot hit HAProxy directly
The health check port (same port if using traffic-port health checks)
Step 5: Validate
Either point your external FQDN to the NLB (recommended) or update your
/etc/hoststo point to the NLB IP address for local testing.Log in to
https://<analytics_hostname>.:<your_frontend_port>and run a network scan as a smoke test. For smoke test execution details, see either the single-node or multi-node installation guide.
Layer 7 (Application / HTTPS) Load Balancer
A Layer 7 load balancer operates at the HTTP/HTTPS level, providing SSL termination, URL-based routing, HTTP-aware health probes, and optional WAF (Web Application Firewall) capabilities. This is recommended when you need TLS inspection, path-based routing, or advanced health checking.
When using a Layer 7 load balancer with AgileSec platform, you must decide how SSL/TLS is handled between the load balancer and the backend frontend nodes:
Option | How it works | Pros | Cons |
|---|---|---|---|
End-to-end SSL (recommended) | LB terminates incoming TLS, re-encrypts traffic to backend on HTTPS | Backend stays on HTTPS, no application changes needed | Requires uploading the backend CA cert to the LB for trust |
SSL offloading | LB terminates TLS, forwards plain HTTP to backend | Simpler LB configuration | Requires changing AgileSec backend to accept HTTP (not recommended) |
Recommended: End-to-end SSL: The AgileSec backend continues to run on HTTPS, and the LB re-encrypts traffic to the backend. This requires the LB to trust the backend's TLS certificate by uploading the AgileSec root CA certificate (only needed with Self-signed / private CA cert).
Generic Configuration Steps
Regardless of the cloud provider or load balancer product, the following steps are required:
Provision a public IP for the load balancer's frontend endpoint.
Obtain a TLS certificate for the LB's public-facing listener. Use a public CA certificate for production, or a self-signed certificate for testing.
Export the AgileSec root CA certificate (for self-signed/private cert only) from the backend. This is located at
<installation-dir>/certificates/ca/agilesec-rootca-cert.pemand is needed so the LB can trust the backend's self-signed TLS certificate during the HTTPS handshake.Create the Layer 7 load balancer with:
A frontend HTTPS listener on port 443 with the frontend TLS certificate
A backend pool pointing to the frontend node(s) private IP(s) on the configured frontend port (e.g., 443 or 8443)
Backend protocol set to HTTPS (for end-to-end SSL)
Upload the AgileSec root CA as a trusted backend root certificate on the LB and associate it with the backend HTTP settings.
Configure an HTTPS health probe to a known endpoint (e.g.,
/signin). Important: The probe'sHostheader must match the backend certificate's CN (typically<analytics_hostname>.<analytics_domain>), not the backend's IP address. A mismatch causes the SSL handshake to fail and the backend to appear unhealthy.Update firewall rules to allow traffic from the LB's subnet/security group to the frontend nodes on the frontend port.
Open the frontend port in the OS-level firewall on each frontend node. On RHEL, disable
firewalldor allow traffic for443/tcp:Point DNS to the load balancer's public IP address.
Validate access to the platform URL, log in, and run a smoke test.
Key Considerations
502 Bad Gateway: This typically means the health probe is failing. Check the backend health status on the LB dashboard or via CLI. The most common cause is a certificate CN mismatch (see step 6 above).
Certificate CN mismatch: The LB connects to backends using their private IP address, but the backend TLS certificate has a hostname-based CN (e.g.,
agilesec.kf-agilesec.com). The health probe and backend HTTP settings must send the correctHostheader matching the certificate CN. It should not be the IP address.Two-layer firewall: Both the cloud-level firewall (security groups, NSGs) AND the OS-level firewall (
firewalldon RHEL) must allow traffic on the frontend port.Dedicated subnet: Some cloud L7 load balancers (e.g., Azure Application Gateway) require their own dedicated subnet that cannot be shared with other resources.
Health probe endpoint: Use
/signin(returns HTTP 200-399) or/health-checkavailable on HAProxy.
Example: Azure Application Gateway with port 443
The following example concretely demonstrates the test of setting up an Azure Application Gateway (Standard_v2) with end-to-end SSL (self-signed). The same concepts apply to AWS ALB, GCP HTTPS Load Balancer, or on-premises solutions like NGINX and F5.
Variables
Set the following variables based on your Azure environment:
RESOURCE_GROUP="<resource-group>"
LOCATION="<azure-region>"
VNET_NAME="<vnet-name>"
SUBNET_APPGW="<dedicated-appgw-subnet>"
NSG_NAME="<nsg-name>"
APPGW_NAME="<appgw-name>"
VM_PRIVATE_IP="<frontend-node-private-ip>"
BACKEND_CERT_CN="<analytics_hostname>.<analytics_domain>" # e.g., agilesec.kf-agilesec.com
Step 1: Create a public IP for the Azure Application Gateway
az network public-ip create \
--resource-group $RESOURCE_GROUP \
--name ${APPGW_NAME}-pip \
--sku Standard \
--allocation-method Static
APPGW_PIP=$(az network public-ip show --resource-group $RESOURCE_GROUP --name ${APPGW_NAME}-pip --query ipAddress -o tsv)
Step 2: Create a frontend TLS certificate (PFX)
Azure Application Gateway requires the frontend listener certificate in PFX (PKCS#12) format.
# Self-signed for testing (use a public CA cert for production):
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout appgw.key -out appgw.crt \
-subj "/CN=$BACKEND_CERT_CN"
openssl pkcs12 -export -out appgw.pfx -inkey appgw.key -in appgw.crt -password pass:AppGwPass123
Step 3: Export the AgileSec root CA for backend trust
scp <user>@<frontend-node>:<installation-dir>/certificates/ca/agilesec-rootca-cert.pem ./backend-trusted-root.cer
Step 4: Create the Application Gateway
az network application-gateway create \
--resource-group $RESOURCE_GROUP \
--name $APPGW_NAME \
--location $LOCATION \
--vnet-name $VNET_NAME \
--subnet $SUBNET_APPGW \
--sku Standard_v2 \
--capacity 1 \
--public-ip-address ${APPGW_NAME}-pip \
--http-settings-port 443 \
--http-settings-protocol Https \
--frontend-port 443 \
--servers $VM_PRIVATE_IP \
--cert-file appgw.pfx \
--cert-password "AppGwPass123" \
--priority 100
Note: Application Gateway provisioning can take 10-20 minutes. Also, The --servers flag accepts multiple space-separated IPs.
Step 5: Configure the backend trusted root certificate
# This uploads the AgileSec root CA certificate to the Application Gateway as a trusted root certificate. This tells the App GW "trust any backend certificate signed by this CA" which is needed for the end-to-end SSL handshake when the App GW connects to the backend over HTTPS. Without it, the App GW would reject the backend's self-signed/private CA certificate and return 502.
az network application-gateway root-cert create \
--resource-group $RESOURCE_GROUP \
--gateway-name $APPGW_NAME \
--name agilesec-backend-ca \
--cert-file backend-trusted-root.cer
# Update HTTP settings to use the trusted root cert
az network application-gateway http-settings update \
--resource-group $RESOURCE_GROUP \
--gateway-name $APPGW_NAME \
--name appGatewayBackendHttpSettings \
--protocol Https \
--port 443 \
--host-name-from-backend-pool false \
--root-certs agilesec-backend-ca
Step 6: Configure the HTTPS health probe
The --host value must match the backend certificate's CN to avoid certificate mismatch errors.
az network application-gateway probe create \
--resource-group $RESOURCE_GROUP \
--gateway-name $APPGW_NAME \
--name agilesec-health-probe \
--protocol Https \
--host $BACKEND_CERT_CN \
--path "/signin" \
--interval 30 \
--timeout 30 \
--threshold 3 \
--match-status-codes "200-399"
# Associate probe with backend HTTP settings
az network application-gateway http-settings update \
--resource-group $RESOURCE_GROUP \
--gateway-name $APPGW_NAME \
--name appGatewayBackendHttpSettings \
--probe agilesec-health-probe
Step 7: Update firewall rules
# Get the App Gateway subnet CIDR
APPGW_SUBNET_CIDR=$(az network vnet subnet show \
--resource-group $RESOURCE_GROUP \
--vnet-name $VNET_NAME \
--name $SUBNET_APPGW \
--query addressPrefix -o tsv)
# Allow App GW subnet to reach frontend nodes on port 443
az network nsg rule create \
--resource-group $RESOURCE_GROUP \
--nsg-name $NSG_NAME \
--name AllowAppGW \
--priority 120 \
--source-address-prefixes $APPGW_SUBNET_CIDR \
--destination-port-ranges 443 \
--access Allow --protocol Tcp --direction Inbound
# Allow Azure App GW infrastructure management traffic (required)
az network nsg rule create \
--resource-group $RESOURCE_GROUP \
--nsg-name $NSG_NAME \
--name AllowAppGWInfra \
--priority 130 \
--source-address-prefixes GatewayManager \
--destination-port-ranges 65200-65535 \
--access Allow --protocol Tcp --direction Inbound
Step 8: Open port 443 in the OS firewall (if firewalld is enabled)
On each frontend node:
sudo firewall-cmd --add-port=443/tcp --permanent
sudo firewall-cmd --reload
Step 9: Validate
Check backend health:
az network application-gateway show-backend-health \
--resource-group $RESOURCE_GROUP \
--name $APPGW_NAME \
--query "backendAddressPools[0].backendHttpSettingsCollection[0].servers[0]" -o json
The health field should show Healthy. Then:
Point your DNS A record to the Application Gateway public IP (
$APPGW_PIP).Access
https://<analytics_hostname>.<analytics_domain>in a browser and verify the login page loads.Log in and run a network scan as a smoke test. For smoke test details, see the single-node or multi-node installation guide.
Test sensor connectivity:
BASHcurl -k https://<analytics_hostname>.<analytics_domain>/v1/oauth2/token # Should return 405 Method Not Allowed (needs POST) — confirms ingestion routing works
Adding a New Scan Node to an Existing Cluster
Scan nodes are asynchronous stateless worker nodes subscribing to Kafka topics to get scan requests, execute the scan, and publish data back to Kafka. Scan nodes only run HAProxy and Scheduler services.
If you want to decouple or distribute scan operations on separate nodes, you can provision one or more scan nodes and run the following installation steps on each scan node. Scan nodes use the env.backend-1 file for configuration.
Step 1: From Backend-1, Copy Certificates and Configuration to Scan Node
scp <installer_directory>/certificates/kf-agilesec.internal-certs.tgz \
$SN-1_IP:<installer_directory>/certificates
Step 2: Install Scan Node
cd <installer-root>/certificates/
tar zxvf kf-agilesec.internal-certs.tgz
cp env.backend-1 ../.env
cd ../
sudo ./scripts/tune.sh -u ec2-user -r scan
./install_analytics.sh -u ec2-user -p <installation-dir> -r scan
Note: Both tune.sh and install_analytics.sh require the special flag -r scan for scan node installations.
Once you have one or more scan nodes, you have the option to permanently disable the Scheduler service on backend-1 and backend-2.
Multi-Region Stretch Cluster with Coordinator Node
When deploying across multiple regions or availability zones(AZ) or data centers(DC), you need an odd number of quorum-participating nodes to maintain majority during an AZ/DC failure. The COORDINATOR profile provides a quorum tiebreaker node that runs only cluster management services without the overhead of a full backend or frontend.
In a standard topology, the PRIMARY_FRONTEND runs quorum services (MongoDB Arbiter, Kafka Controller, OpenSearch Cluster Manager) alongside the user-facing services (API, WebUI, Dashboards). The COORDINATOR profile splits these quorum services onto a dedicated node, allowing you to place the tiebreaker in a separate AZ/DC while keeping all frontends as ADDITIONAL_FRONTEND.
When to Use a Coordinator Node
Stretch clusters spanning 2+ regions or availability zones or data-centers where you need a quorum tiebreaker in a third location
When you want all frontend nodes to be identical (ADDITIONAL_FRONTEND) without one special PRIMARY_FRONTEND
When the quorum tiebreaker AZ/DC doesn't need to serve user traffic (no WebUI, API, or Dashboards needed there)
Coordinator Node Services
A coordinator node runs only quorum-participating services:
Service | Description |
|---|---|
MongoDB Arbiter | Participates in replica set elections. Does not store data. |
Kafka Controller | KRaft quorum voter (port 9094 only). Does not run a broker (no port 9092/9093). |
OpenSearch Cluster Manager | Participates in cluster manager elections. Does not store data. |
Example Topology: 7-Node Stretch Cluster
The following example shows a 7-node stretch cluster across 2 AWS regions and 3 availability zones:
Node | Region / AZ | Role |
|---|---|---|
backend-1 | us-west-1a | Primary full backend |
backend-2 | us-west-1a | Full backend |
frontend-2 | us-west-1a | Additional frontend |
backend-3 | us-west-2b | Full backend |
backend-4 | us-west-2b | Full backend |
coordinator-1 | us-west-2a | Coordinator (quorum tiebreaker) |
frontend-3 | us-west-2b | Additional frontend |

63913c7195fc861dafdffb48a3b657180471428b458c5014dfffb8b819cfbd83
An Availability Zone (AZ) is equivalent to a datacenter or an independent failure domain within a region.
In this topology:
5 quorum voters: 4 backends + 1 coordinator (odd number for majority)
2 frontends serving user traffic (both ADDITIONAL_FRONTEND)
Coordinator in us-west-2a acts as tiebreaker between the two regions
Inter-Datacenter Latency Requirements
A stretch cluster requires low-latency, stable network links between datacenters. Kafka, MongoDB, and OpenSearch perform synchronous replication and quorum coordination across nodes, so inter-datacenter latency directly impacts write performance, quorum elections, and pipeline throughput.
Inter-Datacenter RTT | Minimum Throughput | Suitability | Notes |
|---|---|---|---|
< 5ms | 1 Gbps+ | Ideal | Same metro area or campus network. All services operate at full performance. |
5-20ms | 1 Gbps+ | Recommended | Nearby datacenters or neighboring regions. Tested and recommended range for stretch clusters. |
20-50ms | 500 Mbps+ | Degraded | Write latency increases. Kafka consumer group rebalancing becomes slower. Scans might work but pipeline latency increases. |
> 50ms | Any | Not recommended | Quorum coordination becomes fragile. Consumer groups may fail to stabilize. |
Zero packet loss and low jitter are as important as low latency.
These requirements apply to inter-datacenter links only. Intra-datacenter communication is assumed to be sub-millisecond with multi-Gbps throughput.
Based on stretch cluster testing with 18ms inter-datacenter latency (us-west-1 to us-west-2), all services including Kafka, MongoDB, and OpenSearch operated within acceptable parameters.
Configuration
Edit generate_envs/multi_node_config.conf on backend-1:
Comment out the frontend-1 entries (coordinator replaces its quorum role):
#frontend1_private_ip=""
#frontend1_node_hostname=""
#frontend1_node_profile=""
Set the coordinator IP (uncomment and populate):
coordinator1_private_ip="<coordinator-ip>"
Append the coordinator node configuration block:
cat >> generate_envs/multi_node_config.conf <<'EOF'
coordinator1_node_hostname="coordinator-1"
coordinator1_node_profile="COORDINATOR"
EOF
Set additional backend and frontend IPs as needed for your topology (e.g.,
backend3_private_ip,backend4_private_ip,frontend2_private_ip,frontend3_private_ip).
Set OpenSearch index replication for AZ failover tolerance:
# In generate_envs/multi_node_config.conf, set:
opensearch_index_number_of_shards=4
opensearch_index_number_of_replicas=3
With 4 data nodes across 2 AZs, setting number_of_replicas=3 ensures every index has copies on all 4 data nodes. This means a full AZ failure (losing 2 data nodes) still leaves every index with at least 1 surviving replica, keeping the cluster YELLOW (data accessible) instead of RED (data loss).
Generate environment files on backend-1:
./generate_envs/generate_envs.sh -t multi-node
This will generate env files for all nodes including env.coordinator-1. No env.frontend-1 will be generated since it was commented out.
Generate certificates on backend-1 (same as Step 3: Setup Certificates in the four-node section):
cd <installer_directory>/certificates/
./generate_certs.sh
This creates all certificates and packages them into kf-agilesec.internal-certs.tgz.
System tuning and password setup on backend-1:
cd <installer_directory>
sudo ./scripts/tune.sh -u <user>
cp .pass.example .pass
# Edit .pass and set admin_password
chmod 600 .pass
Copy cert tarball from backend-1 to all other nodes:
cd <installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $BE2_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $BE3_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $BE4_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $COORD1_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $FE2_IP:<installer_directory>/certificates/
scp kf-agilesec.internal-certs.tgz $FE3_IP:<installer_directory>/certificates/
10. On each other node, extract certs, set the correct env file, run tune.sh, and set the admin password. The process is the same as Steps 7-9 in the four-node section — only the env file name differs per node:
Node | Env file to copy |
|---|---|
backend-2 |
|
backend-3 |
|
backend-4 |
|
coordinator-1 |
|
frontend-2 |
|
frontend-3 |
|
On each node:
cd <installer_directory>/certificates/
tar zxvf kf-agilesec.internal-certs.tgz
cp <env-file> ../.env
cd ..
sudo ./scripts/tune.sh -u <user>
cp .pass.example .pass
# Edit .pass and set admin_password
chmod 600 .pass
Installation Order
IMPORTANT:
Kafka quorum during installation: When multiple backend nodes are installed in parallel, the Kafka KRaft quorum may go through re-elections as new nodes join. If Kafka stops running on any backend node during installation, manually restart it with ./scripts/manage.sh start kafka to help the quorum form and allow the installation to make progress.
Health check cron: As installation of each node completes, verify the health check cron job is active on all nodes (crontab -l). This cron automatically detects and restarts stopped services, which helps maintain Kafka quorum stability during normal operation.
Install backend-1 (sequential, must complete first - initializes MongoDB primary, OpenSearch cluster manager, Kafka leader):
./install_analytics.sh install -u <user> -p <installation-dir> --non-interactive
Install backend-2 (sequential, after backend-1 - adds MongoDB secondary):
./install_analytics.sh install -u <user> -p <installation-dir> --non-interactive
Install backend-3, backend-4, coordinator-1 in parallel (after backend-2 completes). The coordinator creates Kafka topics during this phase:
./install_analytics.sh install -u <user> -p <installation-dir> --non-interactive
Wait for all 3 to complete before proceeding.
Install frontend-2, frontend-3 in parallel (after step 3 completes). Frontends depend on Kafka topics created by the coordinator:
./install_analytics.sh install -u <user> -p <installation-dir> --non-interactive
Each node must have its correct .env file set, certificates extracted, and system tuning applied before running the install command.
Post-Installation Verification
On the coordinator node, you should see 3 services:
$ ./manage.sh status
SERVICE DESCRIPTION STATUS
------------------------ ---------------------------------------- -----------
mongodb MongoDB Server Running
opensearch OpenSearch Search Engine Running
kafka Kafka Server Running
On frontend nodes (ADDITIONAL_FRONTEND), you should see 6 services:
$ ./manage.sh status
SERVICE DESCRIPTION STATUS
------------------------ ---------------------------------------- -----------
sm Security Manager Microservice Running
api Web API Microservice Running
webui Web UI Microservice Running
opensearch-dashboards OpenSearch Dashboards Running
cbom CBOM Exporter Microservice Running
haproxy HAProxy Load Balancer Running
Backend nodes should show the standard 9 services (mongodb, opensearch, kafka, scheduler, analytics-manager, ingestion, indexing, sm, haproxy).
Quorum Considerations
The coordinator node's primary purpose is to maintain an odd quorum voter count for Kafka KRaft, MongoDB replica set, and OpenSearch cluster manager elections.
Component | Quorum Voters | Majority Needed |
|---|---|---|
Kafka KRaft | 5 (4 backends + 1 coordinator) | 3 |
MongoDB | 5 (4 backends + 1 coordinator arbiter) | 3 |
OpenSearch | 5 (4 backends + 1 coordinator) | 3 |
Placement guidance:
Place the coordinator in a different AZ from the majority of your backend nodes. This ensures that if one AZ fails, the surviving nodes (in the other AZ + coordinator) retain quorum majority.
In the example topology above, if us-west-1a fails (losing backend-1, backend-2, frontend-2), the surviving nodes in us-west-2 (backend-3, backend-4, coordinator-1, frontend-3) retain 3 of 5 quorum voters.
Testing Your HA Setup
Testing Frontends
Stop one of the frontends and run a network scan through the UI. The scan should complete successfully, confirming that the remaining frontend(s) can handle all traffic.
Testing Backends
Stop one of the backends and run a network scan through the UI. The scan should complete successfully, confirming the cluster remains operational with a single backend node failure.
Note: For a truly highly available cluster, ensure you can lose any single node (frontend or backend) without service interruption.
Testing Datacenter Failover (Stretch Cluster)
This test validates that the cluster survives a full datacenter/AZ failure. It applies to stretch cluster topologies with a coordinator node.
Steps:
Disable health check cron on all nodes (
crontab -l | sed 's|^\(.* .*health_check\.sh\)$|#DISABLED# \1|' | crontab -).Stop all services on all nodes in one datacenter (e.g., backend-1, backend-2, frontend-2 in DC-1).
Run a network scan and a GitHub scann
Both scans should complete successfully.
IMPORTANT: This test requires opensearch_index_number_of_replicas=3 and Kafka topics with replication factor matching the number of backend nodes (set during installation). Without these, some indices or partitions may become unavailable during a full AZ failure.