Current model
Production CXB Core fleet nodes use one pre-forked worker per call slot.least_conn and max_conns=1. CXB Core also enforces MAX_CONCURRENT_CALLS=1 inside each worker.
Why one call per worker
Voice calls are long-lived, stateful, CPU-sensitive workloads. A single active call may hold:- STT streaming connection
- LLM context and tool state
- TTS streaming/generation
- VAD/turn detection
- recorder and transcript buffers
- provider latency and retry state
Current outbound fleet routing
CXB API stores fleet URLs in system settings. For dialout:- CXB API health-checks every fleet URL at
/health/fleet. - It skips unreachable or full servers.
- It picks the healthy server with the most available capacity.
- It sends
/livekit/dialoutto that fleet server.
/attach is a short HTTP POST that starts long-running call work in the selected worker. nginx can see the POST finish while the worker remains logically busy. If nginx later routes another attach to that same worker, CXB Core returns 429 at_capacity even if another worker on the host is free.
Production fleet nginx therefore needs exact route blocks for /attach, /livekit/dialout, and /livekit/widget that retry http_429 across the upstream pool:
/livekit/dispatch until its rejection path is side-effect-free. The current path can remove the inbound SIP participant before returning.
Multi-fleet ingress (HAProxy)
A single deployment can run more than one fleet host. A production deployment can run two fleets,fleet.cxbridge.io and fleet2.cxbridge.io, fronted by an HAProxy ingress at calls.cxbridge.io that distributes inbound WebSocket calls (telephony, Exotel) across both fleets. The per-host nginx + max_conns=1 model is unchanged behind the ingress; HAProxy only spreads load across fleets, while each fleet’s nginx still pins one call per worker.
HAProxy ingress gotchas (learned in production):
| Gotcha | Rule |
|---|---|
retry-on and 429 | HAProxy retry-on does not retry on 429; a busy-fleet 429 must be handled explicitly, not via generic retry. |
| Body-based routing | Use an anchored rstring body check. A leading-comma (non-anchored) match black-holes both fleets. |
haproxy -c | Only validates syntax. Validate routing against a captured request body, not just config parse. |
| Fleet availability bound | fleet_available health signal is bounded to the worker count range (1–16); treat out-of-range values as unhealthy. |
render.sh (templated config) and add-fleet.sh (adds a fleet backend without hand-editing). See WSS ingress.
WSS URL behind a load balancer
The public URL should remain stable:Scale math
At the current 16-call host shape:| Target active calls | Approx fleet hosts |
|---|---|
| 100 | 7 |
| 500 | 32 |
| 1,000 | 63 |
| 20,000 | 1,250 |
What must change for 1K
| Current | 1K-ready direction |
|---|---|
| Static list of fleet URLs | Worker or node registration with TTL. |
| Health polling every fleet URL | Central capacity registry with atomic reservations. |
| One CXB Dialler loop | Partitioned dialler workers or campaign leases. |
| Per-host nginx config | Generated/containerized LB config or service discovery. |
| Host-local logs | Central logs, metrics, and alerts. |
| nginx retry blocks for short async POST routes | Explicit slot reservation or worker-aware routing before attach. |
| Manual node addition | Scripted or orchestrated fleet provisioning. |
Recommended path
- Dockerize without changing behavior.
- Keep
1 worker = 1 call slot. - Replace per-host systemd workers with worker containers.
- Add a per-host LB with
maxconn=1backends for the cheapest production version. - Add a central slot registry before moving beyond tens of hosts.