A single fleet host handles up to 16 concurrent calls. To put multiple fleet hosts behind one carrier-facing hostname for inbound WebSocket calls, an HAProxy WSS ingress sits between the carriers (telephony dialler, Exotel) and the fleets. A typical deployment uses calls.cxbridge.io fronting both fleet.cxbridge.io and fleet2.cxbridge.io. All ingress assets live in the Core service repo under infra/haproxy/ and are brand-portable — nothing client-specific is baked into the templates.

Routing scope

The ingress fronts only long-lived WebSocket call routes. Short POST routes are deliberately rejected because they return before the call pipeline finishes — connection-count load balancing does not represent worker occupancy for them.
RouteBehavior
/ws/{bot_id}Proxied to a fleet (telephony dialler WebSocket)
/exotel/{bot_id}Proxied to a fleet (Exotel Voicebot applet)
/haproxy-healthLocal probe, returns 200 ok
/attach, /livekit/dialout, /livekit/widget, /livekit/dispatchRejected (404) — these use CXB API/CXB Dialler app-level fleet selection via /health/fleet
anything else404 from HAProxy

Capacity-aware health check

leastconn plus per-server maxconn 16 mirrors the per-host worker count, but maxconn only counts sessions HAProxy itself routed. Direct fleet-URL traffic, /attach, and /livekit/* bypass the ingress, so real worker occupancy can exceed HAProxy’s count. To avoid routing into a saturated fleet, the health check inspects the /health/fleet JSON body:
http-check expect rstring "\"fleet_available\":([1-9]|1[0-6])[,}]"
When a fleet’s fleet_available reaches 0 the body stops matching, HAProxy marks that backend DOWN regardless of its own session count, and leastconn routes only to fleets with real capacity left.
Pattern anatomy and footguns (PCRE2):
  • The leading " anchors the key with no preceding comma. A leading comma would break the day fleet_available serializes as the first key in its object (preceding char is {, not ,), marking both identical fleets DOWN at once. The quote alone already rejects decoys like prev_fleet_available / max_fleet_available.
  • Trailing [,}] terminates the value so it matches the whole number.
  • Range 1..16 is intentionally coupled to maxconn 16. If you scale a fleet past 16 workers, fleet_available exceeds 16, the pattern stops matching, and the healthy higher-capacity fleet is marked DOWN — during the exact “add capacity” operation. When you raise maxconn, raise this upper bound in the same change (template + README) and re-validate.
  • haproxy -c only validates syntax — it cannot confirm the expression matches the live body. A wrong anchor validates clean, then black-holes the ingress at runtime. Always validate against a captured body before reload:
curl -sk https://fleet.cxbridge.io/health/fleet \
  | grep -Pq '"fleet_available":([1-9]|1[0-6])[,}]' && echo OK || echo FAIL

Check timing and flapping tradeoff

Fleet server lines use check inter 1s fall 2 rise 2: a fleet must fail two consecutive 1s probes (~2s) before ejection, and pass two before returning. This is not hair-trigger (fall 1) on purpose:
  • With only two fleets, ejecting on a single transient slow /health/fleet response dumps all carrier load onto the one survivor — manufacturing the exact saturation the check was meant to prevent. Fast ejection + low fleet count produces flapping cascades. fall 2 smooths single-probe blips.
  • The cost: a ~2s window where a newly-saturated fleet can still receive a session or two before ejection. Those land at fleet nginx and return 429, which the carrier sees as-is.
  • With a third fleet, fall 1 becomes safer because a single survivor is no longer the failure mode — revisit then.

Retries and 429

option redispatch + retry-on conn-failure empty-response response-timeout retries a different fleet on connection-level failures.
HAProxy 2.8 retry-on only accepts the HTTP status codes 401, 403, 404, 408, 425, 500, 501, 502, 503, 504429 is not retryable and is passed through to the carrier as-is. Rewriting 429 → 503 to force a retry would lie about real status. If 429 churn becomes operationally significant after the health-check tightening above, add another fleet.

TLS and cert renewal

The :443 bind asserts a TLS floor of TLSv1.2 explicitly rather than relying on the OS OpenSSL policy. Cipher selection is left to the OpenSSL default to avoid rejecting a carrier’s TLS stack. Cert renewal uses HTTP-01 via HAProxy — no downtime:
  1. Cron triggers certbot renew.
  2. Certbot starts a temporary listener on 127.0.0.1:8888.
  3. HAProxy’s :80 frontend routes /.well-known/acme-challenge/* to that listener.
  4. Let’s Encrypt validates, certbot writes the new cert.
  5. The deploy hook (renewal-hook.sh) atomically rebuilds the combined /etc/haproxy/certs/<domain>.pem and reloads HAProxy.
This requires authenticator = standalone and http01_port = 8888 in /etc/letsencrypt/renewal/<domain>.conf. Verify with certbot renew --dry-run.

Tooling

ScriptPurpose
render.sh <domain> [fleet-file] [out-dir]Renders the brand-agnostic templates for one deployment, substituting __INGRESS_DOMAIN__ and __FLEET_SERVERS__ (from fleet-servers.txt). Writes haproxy.cfg + the renewal hook, guards against surviving placeholders, and runs a structural haproxy -c (cert + DNS excluded, validated on the target host at install). Safe to run on CI/laptop.
add-fleet.sh [--dry-run] <hostname>Adds an inbound fleet to the live ingress: pre-checks (root, DNS, /health/fleet), clones the last server line so the new fleet inherits the exact flags, backs up, validates, reloads zero-downtime, and waits for the backend to read UP — rolling back automatically on any failure.
Do not hand-edit server lines in /etc/haproxy/haproxy.cfg — use add-fleet.sh. It exists precisely so ops don’t have to understand HAProxy syntax, SNI, or the health-check coupling.

Add a fleet — two paths

Adding a fleet touches two independent paths. Do both.
1

Provision and deploy

Deploy CXB Core on the new host (standard playbook). Confirm https://<new-host>/health/fleet responds.
2

(1) Inbound WSS — add-fleet.sh

On the ingress host, run ./add-fleet.sh fleet3.cxbridge.io (or --dry-run to preview). The script clones, validates, reloads, and rolls back on failure. Keep fleet-servers.txt in the repo in sync (append the same line) so a future re-render matches the box — the script edits the live config only.
3

(2) Outbound — CXB API

Add https://<new-host> to the CXB API fleet list (CXB Console → Settings → Fleet). The fleet picker picks it up on the next dialout — no restart, no code change. The ingress is WSS-only, so this step is what makes the fleet usable for outbound calls.

Sizing and single point of failure

FleetsConcurrent callsIngress sizing
116n/a (no ingress)
2324 vCPU / 8 GB single VM
3484 vCPU / 8 GB single VM
4-564-808 vCPU; monitor TLS handshake CPU
6+96+second ingress + managed LB or VRRP
The ingress is a single point of failure until the HA phase. Plan the second ingress before crossing ~3 fleets.

Operations

# Backend status
echo "show stat" | socat - /run/haproxy/admin.sock | \
    awk -F, '$1=="cxbcore_wss_fleet" && $2 ~ /^fleet/ {print $2, $18, "scur="$5, "slim="$6}'

# Drain one backend (no new sessions, existing keep going)
echo "set server cxbcore_wss_fleet/fleet1 state drain" | socat - /run/haproxy/admin.sock

# Restore
echo "set server cxbcore_wss_fleet/fleet1 state ready" | socat - /run/haproxy/admin.sock

# Stats UI (SSH tunnel from workstation)
ssh -L 8404:127.0.0.1:8404 <ingress-host>   # then open http://localhost:8404/stats