Current model

Production CXB Core fleet nodes use one pre-forked worker per call slot.
MAX_CONCURRENT_CALLS = 1
CXBCORE_WORKERS = 16
1 fleet host = 16 active calls
Each worker is a uvicorn process. nginx routes to Unix sockets using least_conn and max_conns=1. CXB Core also enforces MAX_CONCURRENT_CALLS=1 inside each worker.

Why one call per worker

Voice calls are long-lived, stateful, CPU-sensitive workloads. A single active call may hold:
  • STT streaming connection
  • LLM context and tool state
  • TTS streaming/generation
  • VAD/turn detection
  • recorder and transcript buffers
  • provider latency and retry state
Keeping one call per worker gives predictable failure isolation. If one worker wedges or crashes, it drops one call, not a batch.

Current outbound fleet routing

CXB API stores fleet URLs in system settings. For dialout:
  1. CXB API health-checks every fleet URL at /health/fleet.
  2. It skips unreachable or full servers.
  3. It picks the healthy server with the most available capacity.
  4. It sends /livekit/dialout to that fleet server.
CXB Dialler follows a similar idea for campaign attempts: poll fleet capacity, reserve the best server locally, and attach answered calls. For campaign attach, there is one extra local-host detail: /attach is a short HTTP POST that starts long-running call work in the selected worker. nginx can see the POST finish while the worker remains logically busy. If nginx later routes another attach to that same worker, CXB Core returns 429 at_capacity even if another worker on the host is free. Production fleet nginx therefore needs exact route blocks for /attach, /livekit/dialout, and /livekit/widget that retry http_429 across the upstream pool:
proxy_next_upstream error timeout http_429 non_idempotent;
proxy_next_upstream_tries <worker-count>;
Do not apply that retry block to /livekit/dispatch until its rejection path is side-effect-free. The current path can remove the inbound SIP participant before returning.

Multi-fleet ingress (HAProxy)

A single deployment can run more than one fleet host. A production deployment can run two fleets, fleet.cxbridge.io and fleet2.cxbridge.io, fronted by an HAProxy ingress at calls.cxbridge.io that distributes inbound WebSocket calls (telephony, Exotel) across both fleets. The per-host nginx + max_conns=1 model is unchanged behind the ingress; HAProxy only spreads load across fleets, while each fleet’s nginx still pins one call per worker. HAProxy ingress gotchas (learned in production):
GotchaRule
retry-on and 429HAProxy retry-on does not retry on 429; a busy-fleet 429 must be handled explicitly, not via generic retry.
Body-based routingUse an anchored rstring body check. A leading-comma (non-anchored) match black-holes both fleets.
haproxy -cOnly validates syntax. Validate routing against a captured request body, not just config parse.
Fleet availability boundfleet_available health signal is bounded to the worker count range (1–16); treat out-of-range values as unhealthy.
The HAProxy ingress is brand-portable via render.sh (templated config) and add-fleet.sh (adds a fleet backend without hand-editing). See WSS ingress.

WSS URL behind a load balancer

The public URL should remain stable:
wss://fleet.example.com/ws/{bot_id}
wss://fleet.example.com/exotel/{bot_id}
The load balancer must keep each WebSocket pinned to the backend worker selected during upgrade. The critical property is not the URL; it is capacity-aware routing so one worker does not receive two simultaneous calls.

Scale math

At the current 16-call host shape:
Target active callsApprox fleet hosts
1007
50032
1,00063
20,0001,250
This is why 1K scale is not just “more Docker containers.” The routing, registry, observability, rollout, and cost model all matter.

What must change for 1K

Current1K-ready direction
Static list of fleet URLsWorker or node registration with TTL.
Health polling every fleet URLCentral capacity registry with atomic reservations.
One CXB Dialler loopPartitioned dialler workers or campaign leases.
Per-host nginx configGenerated/containerized LB config or service discovery.
Host-local logsCentral logs, metrics, and alerts.
nginx retry blocks for short async POST routesExplicit slot reservation or worker-aware routing before attach.
Manual node additionScripted or orchestrated fleet provisioning.
  1. Dockerize without changing behavior.
  2. Keep 1 worker = 1 call slot.
  3. Replace per-host systemd workers with worker containers.
  4. Add a per-host LB with maxconn=1 backends for the cheapest production version.
  5. Add a central slot registry before moving beyond tens of hosts.