Communication quality

Good call quality is not only the LLM model. It is the combination of listening, turn-taking, speaking, grounding, handoff, reliability, and measurement. CXB Core has several small controls that together make calls feel less robotic on real phone lines.

Quality model

Layer	What it protects
Listening	The bot should hear the customer clearly and know when speech starts.
Turn-taking	The bot should not interrupt too early or wait forever.
Speaking	Responses should sound natural and avoid awkward silence.
Context	The bot should use CRM data, tools, and knowledge without guessing.
Recovery	Silence, voicemail, transfer, and failure cases should end cleanly.
Measurement	Engineers should be able to see why a call felt slow or wrong.

Better listening

CXB Core uses multiple listening controls before the LLM ever sees a user turn:

Voice activity detection (VAD) detects speech start/stop from audio.
The turn-detection model helps decide when the user has finished their turn.
The turn-detecting STT engine can use external turn events instead of VAD-based turn strategies.
Interim transcripts let the pipeline react quickly while speech is still streaming.
Initial bot-turn guard protects the opening message from early noise or accidental interruption.
Backchannel filtering avoids treating filler sounds like haan, hmm, okay, or ji as meaningful interruptions.

The goal is simple: do not make the bot respond to noise, and do not make the customer repeat themselves.

Better interruption behavior

Phone calls are messy. Customers speak over the bot, give one-word acknowledgements, or say an important phrase while the bot is still talking. CXB Core balances this with:

min_words_interruption, which blocks accidental one-word interruptions.
high-intent interruption detection for phrases such as wrong number, already paid, agent, settlement, and similar Hindi/English variants.
TTS cancellation when the customer truly interrupts.
user-turn strategies that use transcript content, not only raw VAD blips.

This is one of the highest-value parts of the runtime. It makes the bot feel interruptible without letting every small sound derail the conversation.

Better speaking

Speaking quality comes from both voice selection and timing.

Control	Benefit
Multiple TTS engines	Better voices for English, Hindi, and Indian-language campaigns.
TTS Redis cache	Lower latency and lower TTS cost for repeated phrases.
Live prompt cache	Lower time-to-first-token and lower LLM cost by serving the static system prompt from a provider-side cache. See Caching.
Configurable opening message	Ops can tune the first impression per bot.
Re-engagement messages	Silence is handled conversationally instead of as an abrupt failure.
Tool pre-message policies	The bot can speak while a slow API is running.

Custom tools can run with timing policies such as immediate, speak-then-run, speak-and-run-parallel, and terminal-after-speech. This prevents avoidable dead air during API calls.

Silence and dead air handling

Dead air is common in outbound campaigns. CXB Core handles it explicitly:

Wait for the configured silence gap.
Speak a re-engagement message.
Shorten or repeat gaps as configured.
End as RNR after retries are exhausted.

Calls where the customer never spoke skip LLM analysis and receive auto-dispositions. This keeps reporting clean and avoids wasting post-call tokens on empty transcripts.

Context and grounding

The bot sounds better when it has the right facts at the right time.

CRM variables are injected into prompts and opening messages.
SIP headers and variables are preserved for transfer and tool context.
Knowledge Base RAG adds the search_knowledge tool only when knowledge is enabled and attached.
Custom HTTP tools let the bot fetch or update external systems during the call.
Tool telemetry records arguments, response payloads, status, timeout, and duration.

The design principle is that CXB Core executes the call, while CXB API owns the business data and configuration.

Human handoff

When automation should stop, the bot needs a clean exit path. CXB Core supports:

end_call for intentional bot hangup.
transfer_call for configured phone/SIP/WebSocket transfer targets.
Agent Desk handoff for live human queues.
handoff context containing transcript, variables, customer details, flags, tags, and reason.
durable Agent Desk outbox retry if enqueue fails temporarily.

Good communication quality includes knowing when not to keep the bot talking.

Reliability during live calls

Runtime isolation also affects communication quality:

Control	Why it matters
One call per worker	A stuck pipeline affects one call, not many calls.
nginx `max_conns=1`	A busy worker should not receive another live call.
`CallTracker`	App-level capacity guard for call routes.
Result outbox	Call results survive transient CXB API failures.
Agent Desk outbox	Human handoff attempts are not lost on transient enqueue failure.
Shared post-call logic	All transports get the same disposition, analysis, QC, and webhook behavior.

Measurement

CXB Core emits enough detail to debug call quality after the fact:

Signal	Use
`stt_ms`	Detect slow speech recognition.
`llm_ms`	Detect slow model response.
`tts_ms`	Detect slow speech generation.
`tool_ms`	Detect slow custom API calls.
`rag_ms`	Detect slow knowledge search.
Events	See tool calls, knowledge search, disconnect reason, and pipeline state.
`service_error` event	Classifies a pipeline failure as `llm`/`tts`/`stt`/`unknown`, with the failing processor, truncated message, and a `fatal` flag. First place to look when a call ends abruptly.
User-turn events	`user_turn_started` / `user_turn_inference_triggered` / `turn_stop_timeout` / `user_turn_stopped` show how each turn was detected and closed.
Recording	Hear what the customer heard.
Transcript	Compare what was said with what was recognized.

The user-turn stop-reason telemetry explains the hardest call-quality bugs. A turn_stop_timeout event, or a user_turn_stopped with inference_triggered = false but had_content = true, means the customer’s turn was force-closed by the watchdog and its transcript never reached the LLM — the dead-air / force-closed-turn signature. Pair this with VAD settings and service_error events to find the cause.

When a call feels bad, engineers should inspect the transcript, recording, events, latency samples, tool telemetry, and final disposition together. One signal alone rarely explains the whole call.

What to tune first

Symptom	First places to check
Bot interrupts too easily	`min_words_interruption`, backchannel text, VAD settings, STT interim behavior
Bot responds too slowly	STT/LLM/TTS latency, TTS cache, tool policy, RAG latency
Bot talks over the customer	turn detection strategy, VAD stop timing, turn-detecting STT settings
Long silence during API calls	custom tool timing policy and pre-message guidance
Bad answers from documents	KB attachment, trigger instructions, `search_knowledge` events
Poor campaign result quality	re-engagement, auto-dispositions, voicemail handling, test-call matrix

This is the communication-quality engineering layer. The model matters, but the production experience comes from these controls working together.

​Quality model

​Better listening

​Better interruption behavior

​Better speaking

​Silence and dead air handling

​Context and grounding

​Human handoff

​Reliability during live calls

​Measurement

​What to tune first

Quality model

Better listening

Better interruption behavior

Better speaking

Silence and dead air handling

Context and grounding

Human handoff

Reliability during live calls

Measurement

What to tune first