WSS ingress (HAProxy)

A single fleet host handles up to 16 concurrent calls. To put multiple fleet hosts behind one carrier-facing hostname for inbound WebSocket calls, an HAProxy WSS ingress sits between the carriers (telephony dialler, Exotel) and the fleets. A typical deployment uses calls.cxbridge.io fronting both fleet.cxbridge.io and fleet2.cxbridge.io. All ingress assets live in the Core service repo under infra/haproxy/ and are brand-portable — nothing client-specific is baked into the templates.

Routing scope

The ingress fronts only long-lived WebSocket call routes. Short POST routes are deliberately rejected because they return before the call pipeline finishes — connection-count load balancing does not represent worker occupancy for them.

Route	Behavior
`/ws/{bot_id}`	Proxied to a fleet (telephony dialler WebSocket)
`/exotel/{bot_id}`	Proxied to a fleet (Exotel Voicebot applet)
`/haproxy-health`	Local probe, returns `200 ok`
`/attach`, `/livekit/dialout`, `/livekit/widget`, `/livekit/dispatch`	Rejected (`404`) — these use CXB API/CXB Dialler app-level fleet selection via `/health/fleet`
anything else	`404` from HAProxy

Capacity-aware health check

leastconn plus per-server maxconn 16 mirrors the per-host worker count, but maxconn only counts sessions HAProxy itself routed. Direct fleet-URL traffic, /attach, and /livekit/* bypass the ingress, so real worker occupancy can exceed HAProxy’s count. To avoid routing into a saturated fleet, the health check inspects the /health/fleet JSON body:

http-check expect rstring "\"fleet_available\":([1-9]|1[0-6])[,}]"

When a fleet’s fleet_available reaches 0 the body stops matching, HAProxy marks that backend DOWN regardless of its own session count, and leastconn routes only to fleets with real capacity left.

Pattern anatomy and footguns (PCRE2):

The leading " anchors the key with no preceding comma. A leading comma would break the day fleet_available serializes as the first key in its object (preceding char is {, not ,), marking both identical fleets DOWN at once. The quote alone already rejects decoys like prev_fleet_available / max_fleet_available.
Trailing [,}] terminates the value so it matches the whole number.
Range 1..16 is intentionally coupled to maxconn 16. If you scale a fleet past 16 workers, fleet_available exceeds 16, the pattern stops matching, and the healthy higher-capacity fleet is marked DOWN — during the exact “add capacity” operation. When you raise maxconn, raise this upper bound in the same change (template + README) and re-validate.
haproxy -c only validates syntax — it cannot confirm the expression matches the live body. A wrong anchor validates clean, then black-holes the ingress at runtime. Always validate against a captured body before reload:

curl -sk https://fleet.cxbridge.io/health/fleet \
  | grep -Pq '"fleet_available":([1-9]|1[0-6])[,}]' && echo OK || echo FAIL

Check timing and flapping tradeoff

Fleet server lines use check inter 1s fall 2 rise 2: a fleet must fail two consecutive 1s probes (~2s) before ejection, and pass two before returning. This is not hair-trigger (fall 1) on purpose:

With only two fleets, ejecting on a single transient slow /health/fleet response dumps all carrier load onto the one survivor — manufacturing the exact saturation the check was meant to prevent. Fast ejection + low fleet count produces flapping cascades. fall 2 smooths single-probe blips.
The cost: a ~2s window where a newly-saturated fleet can still receive a session or two before ejection. Those land at fleet nginx and return 429, which the carrier sees as-is.
With a third fleet, fall 1 becomes safer because a single survivor is no longer the failure mode — revisit then.

Retries and 429

option redispatch + retry-on conn-failure empty-response response-timeout retries a different fleet on connection-level failures.

HAProxy 2.8 retry-on only accepts the HTTP status codes 401, 403, 404, 408, 425, 500, 501, 502, 503, 504 — 429 is not retryable and is passed through to the carrier as-is. Rewriting 429 → 503 to force a retry would lie about real status. If 429 churn becomes operationally significant after the health-check tightening above, add another fleet.

TLS and cert renewal

The :443 bind asserts a TLS floor of TLSv1.2 explicitly rather than relying on the OS OpenSSL policy. Cipher selection is left to the OpenSSL default to avoid rejecting a carrier’s TLS stack. Cert renewal uses HTTP-01 via HAProxy — no downtime:

Cron triggers certbot renew.
Certbot starts a temporary listener on 127.0.0.1:8888.
HAProxy’s :80 frontend routes /.well-known/acme-challenge/* to that listener.
Let’s Encrypt validates, certbot writes the new cert.
The deploy hook (renewal-hook.sh) atomically rebuilds the combined /etc/haproxy/certs/<domain>.pem and reloads HAProxy.

This requires authenticator = standalone and http01_port = 8888 in /etc/letsencrypt/renewal/<domain>.conf. Verify with certbot renew --dry-run.

Tooling

Script	Purpose
`render.sh <domain> [fleet-file] [out-dir]`	Renders the brand-agnostic templates for one deployment, substituting `__INGRESS_DOMAIN__` and `__FLEET_SERVERS__` (from `fleet-servers.txt`). Writes `haproxy.cfg` + the renewal hook, guards against surviving placeholders, and runs a structural `haproxy -c` (cert + DNS excluded, validated on the target host at install). Safe to run on CI/laptop.
`add-fleet.sh [--dry-run] <hostname>`	Adds an inbound fleet to the live ingress: pre-checks (root, DNS, `/health/fleet`), clones the last `server` line so the new fleet inherits the exact flags, backs up, validates, reloads zero-downtime, and waits for the backend to read UP — rolling back automatically on any failure.

Do not hand-edit server lines in /etc/haproxy/haproxy.cfg — use add-fleet.sh. It exists precisely so ops don’t have to understand HAProxy syntax, SNI, or the health-check coupling.

Add a fleet — two paths

Adding a fleet touches two independent paths. Do both.

Provision and deploy

Deploy CXB Core on the new host (standard playbook). Confirm https://<new-host>/health/fleet responds.

(1) Inbound WSS — add-fleet.sh

On the ingress host, run ./add-fleet.sh fleet3.cxbridge.io (or --dry-run to preview). The script clones, validates, reloads, and rolls back on failure. Keep fleet-servers.txt in the repo in sync (append the same line) so a future re-render matches the box — the script edits the live config only.

(2) Outbound — CXB API

Add https://<new-host> to the CXB API fleet list (CXB Console → Settings → Fleet). The fleet picker picks it up on the next dialout — no restart, no code change. The ingress is WSS-only, so this step is what makes the fleet usable for outbound calls.

Sizing and single point of failure

Fleets	Concurrent calls	Ingress sizing
1	16	n/a (no ingress)
2	32	4 vCPU / 8 GB single VM
3	48	4 vCPU / 8 GB single VM
4-5	64-80	8 vCPU; monitor TLS handshake CPU
6+	96+	second ingress + managed LB or VRRP

The ingress is a single point of failure until the HA phase. Plan the second ingress before crossing ~3 fleets.

Operations

# Backend status
echo "show stat" | socat - /run/haproxy/admin.sock | \
    awk -F, '$1=="cxbcore_wss_fleet" && $2 ~ /^fleet/ {print $2, $18, "scur="$5, "slim="$6}'

# Drain one backend (no new sessions, existing keep going)
echo "set server cxbcore_wss_fleet/fleet1 state drain" | socat - /run/haproxy/admin.sock

# Restore
echo "set server cxbcore_wss_fleet/fleet1 state ready" | socat - /run/haproxy/admin.sock

# Stats UI (SSH tunnel from workstation)
ssh -L 8404:127.0.0.1:8404 <ingress-host>   # then open http://localhost:8404/stats

​Routing scope

​Capacity-aware health check

​Check timing and flapping tradeoff

​Retries and 429

​TLS and cert renewal

​Tooling

​Add a fleet — two paths

​Sizing and single point of failure

​Operations

Routing scope

Capacity-aware health check

Check timing and flapping tradeoff

Retries and 429

TLS and cert renewal

Tooling

Add a fleet — two paths

Sizing and single point of failure

Operations