Good call quality is not only the LLM model. It is the combination of listening, turn-taking, speaking, grounding, handoff, reliability, and measurement. CXB Core has several small controls that together make calls feel less robotic on real phone lines.
Quality model
| Layer | What it protects |
|---|
| Listening | The bot should hear the customer clearly and know when speech starts. |
| Turn-taking | The bot should not interrupt too early or wait forever. |
| Speaking | Responses should sound natural and avoid awkward silence. |
| Context | The bot should use CRM data, tools, and knowledge without guessing. |
| Recovery | Silence, voicemail, transfer, and failure cases should end cleanly. |
| Measurement | Engineers should be able to see why a call felt slow or wrong. |
Better listening
CXB Core uses multiple listening controls before the LLM ever sees a user turn:
- Voice activity detection (VAD) detects speech start/stop from audio.
- The turn-detection model helps decide when the user has finished their turn.
- The turn-detecting STT engine can use external turn events instead of VAD-based turn strategies.
- Interim transcripts let the pipeline react quickly while speech is still streaming.
- Initial bot-turn guard protects the opening message from early noise or accidental interruption.
- Backchannel filtering avoids treating filler sounds like
haan, hmm, okay, or ji as meaningful interruptions.
The goal is simple: do not make the bot respond to noise, and do not make the customer repeat themselves.
Better interruption behavior
Phone calls are messy. Customers speak over the bot, give one-word acknowledgements, or say an important phrase while the bot is still talking.
CXB Core balances this with:
min_words_interruption, which blocks accidental one-word interruptions.
- high-intent interruption detection for phrases such as
wrong number, already paid, agent, settlement, and similar Hindi/English variants.
- TTS cancellation when the customer truly interrupts.
- user-turn strategies that use transcript content, not only raw VAD blips.
This is one of the highest-value parts of the runtime. It makes the bot feel interruptible without letting every small sound derail the conversation.
Better speaking
Speaking quality comes from both voice selection and timing.
| Control | Benefit |
|---|
| Multiple TTS engines | Better voices for English, Hindi, and Indian-language campaigns. |
| TTS Redis cache | Lower latency and lower TTS cost for repeated phrases. |
| Live prompt cache | Lower time-to-first-token and lower LLM cost by serving the static system prompt from a provider-side cache. See Caching. |
| Configurable opening message | Ops can tune the first impression per bot. |
| Re-engagement messages | Silence is handled conversationally instead of as an abrupt failure. |
| Tool pre-message policies | The bot can speak while a slow API is running. |
Custom tools can run with timing policies such as immediate, speak-then-run, speak-and-run-parallel, and terminal-after-speech. This prevents avoidable dead air during API calls.
Silence and dead air handling
Dead air is common in outbound campaigns. CXB Core handles it explicitly:
- Wait for the configured silence gap.
- Speak a re-engagement message.
- Shorten or repeat gaps as configured.
- End as
RNR after retries are exhausted.
Calls where the customer never spoke skip LLM analysis and receive auto-dispositions. This keeps reporting clean and avoids wasting post-call tokens on empty transcripts.
Context and grounding
The bot sounds better when it has the right facts at the right time.
- CRM variables are injected into prompts and opening messages.
- SIP headers and variables are preserved for transfer and tool context.
- Knowledge Base RAG adds the
search_knowledge tool only when knowledge is enabled and attached.
- Custom HTTP tools let the bot fetch or update external systems during the call.
- Tool telemetry records arguments, response payloads, status, timeout, and duration.
The design principle is that CXB Core executes the call, while CXB API owns the business data and configuration.
Human handoff
When automation should stop, the bot needs a clean exit path.
CXB Core supports:
end_call for intentional bot hangup.
transfer_call for configured phone/SIP/WebSocket transfer targets.
- Agent Desk handoff for live human queues.
- handoff context containing transcript, variables, customer details, flags, tags, and reason.
- durable Agent Desk outbox retry if enqueue fails temporarily.
Good communication quality includes knowing when not to keep the bot talking.
Reliability during live calls
Runtime isolation also affects communication quality:
| Control | Why it matters |
|---|
| One call per worker | A stuck pipeline affects one call, not many calls. |
nginx max_conns=1 | A busy worker should not receive another live call. |
CallTracker | App-level capacity guard for call routes. |
| Result outbox | Call results survive transient CXB API failures. |
| Agent Desk outbox | Human handoff attempts are not lost on transient enqueue failure. |
| Shared post-call logic | All transports get the same disposition, analysis, QC, and webhook behavior. |
Measurement
CXB Core emits enough detail to debug call quality after the fact:
| Signal | Use |
|---|
stt_ms | Detect slow speech recognition. |
llm_ms | Detect slow model response. |
tts_ms | Detect slow speech generation. |
tool_ms | Detect slow custom API calls. |
rag_ms | Detect slow knowledge search. |
| Events | See tool calls, knowledge search, disconnect reason, and pipeline state. |
service_error event | Classifies a pipeline failure as llm/tts/stt/unknown, with the failing processor, truncated message, and a fatal flag. First place to look when a call ends abruptly. |
| User-turn events | user_turn_started / user_turn_inference_triggered / turn_stop_timeout / user_turn_stopped show how each turn was detected and closed. |
| Recording | Hear what the customer heard. |
| Transcript | Compare what was said with what was recognized. |
The user-turn stop-reason telemetry explains the hardest call-quality bugs. A turn_stop_timeout event, or a user_turn_stopped with inference_triggered = false but had_content = true, means the customer’s turn was force-closed by the watchdog and its transcript never reached the LLM — the dead-air / force-closed-turn signature. Pair this with VAD settings and service_error events to find the cause.
When a call feels bad, engineers should inspect the transcript, recording, events, latency samples, tool telemetry, and final disposition together. One signal alone rarely explains the whole call.
What to tune first
| Symptom | First places to check |
|---|
| Bot interrupts too easily | min_words_interruption, backchannel text, VAD settings, STT interim behavior |
| Bot responds too slowly | STT/LLM/TTS latency, TTS cache, tool policy, RAG latency |
| Bot talks over the customer | turn detection strategy, VAD stop timing, turn-detecting STT settings |
| Long silence during API calls | custom tool timing policy and pre-message guidance |
| Bad answers from documents | KB attachment, trigger instructions, search_knowledge events |
| Poor campaign result quality | re-engagement, auto-dispositions, voicemail handling, test-call matrix |
This is the communication-quality engineering layer. The model matters, but the production experience comes from these controls working together.