Good call quality is not only the LLM model. It is the combination of listening, turn-taking, speaking, grounding, handoff, reliability, and measurement. CXB Core has several small controls that together make calls feel less robotic on real phone lines.

Quality model

LayerWhat it protects
ListeningThe bot should hear the customer clearly and know when speech starts.
Turn-takingThe bot should not interrupt too early or wait forever.
SpeakingResponses should sound natural and avoid awkward silence.
ContextThe bot should use CRM data, tools, and knowledge without guessing.
RecoverySilence, voicemail, transfer, and failure cases should end cleanly.
MeasurementEngineers should be able to see why a call felt slow or wrong.

Better listening

CXB Core uses multiple listening controls before the LLM ever sees a user turn:
  • Voice activity detection (VAD) detects speech start/stop from audio.
  • The turn-detection model helps decide when the user has finished their turn.
  • The turn-detecting STT engine can use external turn events instead of VAD-based turn strategies.
  • Interim transcripts let the pipeline react quickly while speech is still streaming.
  • Initial bot-turn guard protects the opening message from early noise or accidental interruption.
  • Backchannel filtering avoids treating filler sounds like haan, hmm, okay, or ji as meaningful interruptions.
The goal is simple: do not make the bot respond to noise, and do not make the customer repeat themselves.

Better interruption behavior

Phone calls are messy. Customers speak over the bot, give one-word acknowledgements, or say an important phrase while the bot is still talking. CXB Core balances this with:
  • min_words_interruption, which blocks accidental one-word interruptions.
  • high-intent interruption detection for phrases such as wrong number, already paid, agent, settlement, and similar Hindi/English variants.
  • TTS cancellation when the customer truly interrupts.
  • user-turn strategies that use transcript content, not only raw VAD blips.
This is one of the highest-value parts of the runtime. It makes the bot feel interruptible without letting every small sound derail the conversation.

Better speaking

Speaking quality comes from both voice selection and timing.
ControlBenefit
Multiple TTS enginesBetter voices for English, Hindi, and Indian-language campaigns.
TTS Redis cacheLower latency and lower TTS cost for repeated phrases.
Live prompt cacheLower time-to-first-token and lower LLM cost by serving the static system prompt from a provider-side cache. See Caching.
Configurable opening messageOps can tune the first impression per bot.
Re-engagement messagesSilence is handled conversationally instead of as an abrupt failure.
Tool pre-message policiesThe bot can speak while a slow API is running.
Custom tools can run with timing policies such as immediate, speak-then-run, speak-and-run-parallel, and terminal-after-speech. This prevents avoidable dead air during API calls.

Silence and dead air handling

Dead air is common in outbound campaigns. CXB Core handles it explicitly:
  1. Wait for the configured silence gap.
  2. Speak a re-engagement message.
  3. Shorten or repeat gaps as configured.
  4. End as RNR after retries are exhausted.
Calls where the customer never spoke skip LLM analysis and receive auto-dispositions. This keeps reporting clean and avoids wasting post-call tokens on empty transcripts.

Context and grounding

The bot sounds better when it has the right facts at the right time.
  • CRM variables are injected into prompts and opening messages.
  • SIP headers and variables are preserved for transfer and tool context.
  • Knowledge Base RAG adds the search_knowledge tool only when knowledge is enabled and attached.
  • Custom HTTP tools let the bot fetch or update external systems during the call.
  • Tool telemetry records arguments, response payloads, status, timeout, and duration.
The design principle is that CXB Core executes the call, while CXB API owns the business data and configuration.

Human handoff

When automation should stop, the bot needs a clean exit path. CXB Core supports:
  • end_call for intentional bot hangup.
  • transfer_call for configured phone/SIP/WebSocket transfer targets.
  • Agent Desk handoff for live human queues.
  • handoff context containing transcript, variables, customer details, flags, tags, and reason.
  • durable Agent Desk outbox retry if enqueue fails temporarily.
Good communication quality includes knowing when not to keep the bot talking.

Reliability during live calls

Runtime isolation also affects communication quality:
ControlWhy it matters
One call per workerA stuck pipeline affects one call, not many calls.
nginx max_conns=1A busy worker should not receive another live call.
CallTrackerApp-level capacity guard for call routes.
Result outboxCall results survive transient CXB API failures.
Agent Desk outboxHuman handoff attempts are not lost on transient enqueue failure.
Shared post-call logicAll transports get the same disposition, analysis, QC, and webhook behavior.

Measurement

CXB Core emits enough detail to debug call quality after the fact:
SignalUse
stt_msDetect slow speech recognition.
llm_msDetect slow model response.
tts_msDetect slow speech generation.
tool_msDetect slow custom API calls.
rag_msDetect slow knowledge search.
EventsSee tool calls, knowledge search, disconnect reason, and pipeline state.
service_error eventClassifies a pipeline failure as llm/tts/stt/unknown, with the failing processor, truncated message, and a fatal flag. First place to look when a call ends abruptly.
User-turn eventsuser_turn_started / user_turn_inference_triggered / turn_stop_timeout / user_turn_stopped show how each turn was detected and closed.
RecordingHear what the customer heard.
TranscriptCompare what was said with what was recognized.
The user-turn stop-reason telemetry explains the hardest call-quality bugs. A turn_stop_timeout event, or a user_turn_stopped with inference_triggered = false but had_content = true, means the customer’s turn was force-closed by the watchdog and its transcript never reached the LLM — the dead-air / force-closed-turn signature. Pair this with VAD settings and service_error events to find the cause.
When a call feels bad, engineers should inspect the transcript, recording, events, latency samples, tool telemetry, and final disposition together. One signal alone rarely explains the whole call.

What to tune first

SymptomFirst places to check
Bot interrupts too easilymin_words_interruption, backchannel text, VAD settings, STT interim behavior
Bot responds too slowlySTT/LLM/TTS latency, TTS cache, tool policy, RAG latency
Bot talks over the customerturn detection strategy, VAD stop timing, turn-detecting STT settings
Long silence during API callscustom tool timing policy and pre-message guidance
Bad answers from documentsKB attachment, trigger instructions, search_knowledge events
Poor campaign result qualityre-engagement, auto-dispositions, voicemail handling, test-call matrix
This is the communication-quality engineering layer. The model matters, but the production experience comes from these controls working together.