CXB Core runs as multiple pre-forked uvicorn processes behind an nginx reverse proxy. Each process listens on a Unix domain socket and handles exactly one call at a time, enforced by MAX_CONCURRENT_CALLS=1 inside CXB Core and nginx max_conns=1 for long-lived connections.
Architecture
Each worker is a single-process uvicorn instance (--workers 1) listening on a Unix socket in /tmp/. nginx distributes incoming connections using least_conn with max_conns=1. For WebSocket-style long-lived connections, that pins one live call to one worker. For short POST call-start routes such as /attach, nginx needs explicit retry-on-429 route blocks because the HTTP request returns before the background call pipeline ends.
Workers are pre-forked at boot — all imports (the media pipeline, on-device model runtime, STT/LLM/TTS clients) are loaded before the first call arrives. No cold start penalty.
total_capacity = CXBCORE_WORKERS × 1 (one call per worker)
| Setting | Default | Description |
|---|
MAX_CONCURRENT_CALLS | 1 | Per-worker limit (enforced by nginx max_conns=1) |
CXBCORE_WORKERS | 16 | Number of pre-forked worker processes |
| Total | 16 | Concurrent calls per server |
When all workers are busy, call-start routes return capacity errors (429 at_capacity from CXB Core or a proxy error if no backend can accept the request). The fail_timeout=10 parameter marks a busy/crashed worker as unavailable for 10 seconds.
Async call-start routes
Campaign and LiveKit call-start endpoints can be short HTTP requests that leave long-running call work active on the selected worker after the response is returned:
/attach
/livekit/dialout
/livekit/widget
Because nginx only sees the short HTTP connection, least_conn can choose a worker that is already busy. CXB Core then correctly rejects that worker-local request with 429 at_capacity, even when other workers on the same host are free. Production nginx must retry those specific routes across the upstream pool:
location = /attach {
proxy_pass http://cxbcore_workers;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_next_upstream error timeout http_429 non_idempotent;
proxy_next_upstream_tries 16;
proxy_connect_timeout 5s;
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
}
Use the same block for /livekit/dialout and /livekit/widget, with proxy_next_upstream_tries equal to the worker count on that host.
Do not apply this retry block to /livekit/dispatch until its rejection path is side-effect-free. Its current rejection path can remove the inbound SIP participant before returning.
Systemd configuration
CXB Core uses a systemd template unit. Worker N gets socket /tmp/cxbcore_N.sock:
# /etc/systemd/system/cxbcore@.service
[Unit]
Description=CXB Core Worker %i
After=network.target
[Service]
Type=exec
Environment=CXBCORE_SOCKET=/tmp/cxbcore_%i.sock
ExecStart=/opt/core/scripts/run_worker.sh
WorkingDirectory=/opt/core
EnvironmentFile=/opt/core/.env
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
The run script detects socket mode and starts uvicorn on the assigned socket:
#!/usr/bin/env bash
# /opt/core/scripts/run_worker.sh
set -euo pipefail
SOCKET="${CXBCORE_SOCKET:-}"
PORT="${CXBCORE_PORT:-8001}"
HOST="${CXBCORE_HOST:-127.0.0.1}"
if [ -n "$SOCKET" ]; then
# Unix domain socket mode (pre-forked architecture)
rm -f "$SOCKET"
exec uv run uvicorn cxbcore.app:app \
--uds "$SOCKET" \
--workers 1 \
--app-dir src \
--log-level "${LOG_LEVEL:-info}"
else
# TCP port mode (legacy / development)
exec uv run uvicorn cxbcore.app:app \
--host "$HOST" \
--port "$PORT" \
--workers 1 \
--app-dir src \
--log-level "${LOG_LEVEL:-info}"
fi
Never change --workers in uvicorn. Each systemd instance IS one worker. Uvicorn’s --workers flag would fork additional processes, breaking the capacity model.
Service management
# Start/stop/restart all workers
systemctl restart 'cxbcore@*'
# Restart a single worker (zero downtime — others keep serving)
systemctl restart cxbcore@2
# View logs (all workers)
journalctl -u 'cxbcore@*' -f
# View logs (single worker)
journalctl -u cxbcore@3 -f
# Verify all sockets exist
ls /tmp/cxbcore_*.sock | wc -l
nginx configuration
upstream cxbcore_workers {
least_conn;
server unix:/tmp/cxbcore_1.sock max_conns=1 fail_timeout=10;
server unix:/tmp/cxbcore_2.sock max_conns=1 fail_timeout=10;
...
server unix:/tmp/cxbcore_16.sock max_conns=1 fail_timeout=10;
}
Use map $http_upgrade for the Connection header — hardcoded Connection "upgrade" breaks HTTP POST endpoints like dialout (returns 422).
The canonical fleet nginx template is tracked in the Core service repo at infra/nginx/fleet.conf.template.
map $http_upgrade $connection_upgrade {
default upgrade;
"" "";
}
server {
location / {
proxy_pass http://cxbcore_workers;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
}
}
Workers only listen on Unix sockets — all external traffic goes through nginx.
Multi-fleet ingress
A single fleet host (nginx + 16 workers) handles up to 16 concurrent calls. To front multiple fleet hosts behind one carrier-facing hostname for inbound WebSocket calls, a separate HAProxy WSS ingress sits in front of the fleets. See WSS ingress for the HAProxy layer, its capacity-aware health check, and the two-path procedure for adding a fleet.