CXB Core runs as multiple pre-forked uvicorn processes behind an nginx reverse proxy. Each process listens on a Unix domain socket and handles exactly one call at a time, enforced by MAX_CONCURRENT_CALLS=1 inside CXB Core and nginx max_conns=1 for long-lived connections.

Architecture

Each worker is a single-process uvicorn instance (--workers 1) listening on a Unix socket in /tmp/. nginx distributes incoming connections using least_conn with max_conns=1. For WebSocket-style long-lived connections, that pins one live call to one worker. For short POST call-start routes such as /attach, nginx needs explicit retry-on-429 route blocks because the HTTP request returns before the background call pipeline ends. Workers are pre-forked at boot — all imports (the media pipeline, on-device model runtime, STT/LLM/TTS clients) are loaded before the first call arrives. No cold start penalty.

Capacity formula

total_capacity = CXBCORE_WORKERS × 1  (one call per worker)
SettingDefaultDescription
MAX_CONCURRENT_CALLS1Per-worker limit (enforced by nginx max_conns=1)
CXBCORE_WORKERS16Number of pre-forked worker processes
Total16Concurrent calls per server
When all workers are busy, call-start routes return capacity errors (429 at_capacity from CXB Core or a proxy error if no backend can accept the request). The fail_timeout=10 parameter marks a busy/crashed worker as unavailable for 10 seconds.

Async call-start routes

Campaign and LiveKit call-start endpoints can be short HTTP requests that leave long-running call work active on the selected worker after the response is returned:
  • /attach
  • /livekit/dialout
  • /livekit/widget
Because nginx only sees the short HTTP connection, least_conn can choose a worker that is already busy. CXB Core then correctly rejects that worker-local request with 429 at_capacity, even when other workers on the same host are free. Production nginx must retry those specific routes across the upstream pool:
location = /attach {
    proxy_pass http://cxbcore_workers;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_next_upstream error timeout http_429 non_idempotent;
    proxy_next_upstream_tries 16;
    proxy_connect_timeout 5s;
    proxy_read_timeout 3600s;
    proxy_send_timeout 3600s;
}
Use the same block for /livekit/dialout and /livekit/widget, with proxy_next_upstream_tries equal to the worker count on that host.
Do not apply this retry block to /livekit/dispatch until its rejection path is side-effect-free. Its current rejection path can remove the inbound SIP participant before returning.

Systemd configuration

CXB Core uses a systemd template unit. Worker N gets socket /tmp/cxbcore_N.sock:
# /etc/systemd/system/cxbcore@.service
[Unit]
Description=CXB Core Worker %i
After=network.target

[Service]
Type=exec
Environment=CXBCORE_SOCKET=/tmp/cxbcore_%i.sock
ExecStart=/opt/core/scripts/run_worker.sh
WorkingDirectory=/opt/core
EnvironmentFile=/opt/core/.env
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target
The run script detects socket mode and starts uvicorn on the assigned socket:
#!/usr/bin/env bash
# /opt/core/scripts/run_worker.sh
set -euo pipefail

SOCKET="${CXBCORE_SOCKET:-}"
PORT="${CXBCORE_PORT:-8001}"
HOST="${CXBCORE_HOST:-127.0.0.1}"

if [ -n "$SOCKET" ]; then
    # Unix domain socket mode (pre-forked architecture)
    rm -f "$SOCKET"
    exec uv run uvicorn cxbcore.app:app \
        --uds "$SOCKET" \
        --workers 1 \
        --app-dir src \
        --log-level "${LOG_LEVEL:-info}"
else
    # TCP port mode (legacy / development)
    exec uv run uvicorn cxbcore.app:app \
        --host "$HOST" \
        --port "$PORT" \
        --workers 1 \
        --app-dir src \
        --log-level "${LOG_LEVEL:-info}"
fi
Never change --workers in uvicorn. Each systemd instance IS one worker. Uvicorn’s --workers flag would fork additional processes, breaking the capacity model.

Service management

# Start/stop/restart all workers
systemctl restart 'cxbcore@*'

# Restart a single worker (zero downtime — others keep serving)
systemctl restart cxbcore@2

# View logs (all workers)
journalctl -u 'cxbcore@*' -f

# View logs (single worker)
journalctl -u cxbcore@3 -f

# Verify all sockets exist
ls /tmp/cxbcore_*.sock | wc -l

nginx configuration

upstream cxbcore_workers {
    least_conn;
    server unix:/tmp/cxbcore_1.sock max_conns=1 fail_timeout=10;
    server unix:/tmp/cxbcore_2.sock max_conns=1 fail_timeout=10;
    ...
    server unix:/tmp/cxbcore_16.sock max_conns=1 fail_timeout=10;
}
Use map $http_upgrade for the Connection header — hardcoded Connection "upgrade" breaks HTTP POST endpoints like dialout (returns 422).
The canonical fleet nginx template is tracked in the Core service repo at infra/nginx/fleet.conf.template.
map $http_upgrade $connection_upgrade {
    default upgrade;
    ""      "";
}

server {
    location / {
        proxy_pass http://cxbcore_workers;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;
    }
}
Workers only listen on Unix sockets — all external traffic goes through nginx.

Multi-fleet ingress

A single fleet host (nginx + 16 workers) handles up to 16 concurrent calls. To front multiple fleet hosts behind one carrier-facing hostname for inbound WebSocket calls, a separate HAProxy WSS ingress sits in front of the fleets. See WSS ingress for the HAProxy layer, its capacity-aware health check, and the two-path procedure for adding a fleet.