Quick triage
| Problem | First checks |
|---|
| Campaign will not start | Bot exists, number pool not empty, concurrency > 0, time window valid, CSV uploaded. |
| Campaign running but no calls | Time window, CXB Dialler health, fleet capacity, pending/retryable rows. |
| Many no-answer/rejected | Carrier/SIP status, number quality, time of day, caller ID reputation. |
| Calls answer but bot does not join | CXB Core attach errors, fleet availability, LiveKit room state. |
| Bot says wrong customer info | CSV headers, CRM pre-fetch, prompt variables. |
| Retries not happening | Max attempts reached, retry delay not elapsed, outcome not in retry rules. |
| Attempt report missing expected data | Attempt still in progress or result webhook not finalized. |
Data issues
If the bot speaks curly braces like:
then the variable is missing. Check:
- CSV header spelling
- CRM response field spelling
- prompt variable spelling
- flat vs namespaced variable usage
Dialler health
Engineering can check on the CXB Dialler host:
curl -fsS http://127.0.0.1:8090/health
uv run python scripts/smoke_check.py
The smoke check validates settings, MongoDB, indexes, health, metrics, fleet reachability, and stale campaign states.
Stuck states
Escalate if many rows remain old in:
leased
ringing
amd_screening
attaching
dialling
in_progress
_processing
callback_scheduled and retry_scheduled are normal waiting states, not stuck states — they hold until their due time. Only escalate if a callback_scheduled row stays past its scheduled time while the campaign is running and within its window.
These states should move forward or be recovered by stale handling. Persistent buildup means the dialler, LiveKit, CXB Core, or result path needs investigation.
SIP patterns
Repeated SIP status patterns usually point outside bot logic:
| Pattern | Common meaning |
|---|
| 408 / 480 | No answer or temporarily unavailable. |
| 486 / 603 | Busy/rejected/declined. |
| 429 | Rate limited. |
| 5xx | Carrier/provider/server issue. |
How the dialler maps SIP to a disconnect reason: on an unanswered dial, a SIP code of 408, 480, or an empty/missing code is recorded as no_answer; any other non-answer code is recorded as rejected. So a 486/603 (and most other failure codes) become rejected, while ring-no-answer and timeouts become no_answer.
Do not change the bot prompt to fix SIP failures.