Skip to main content

Key Concepts

Understanding these core concepts will help you get the most out of BlackBox’s AI voice agent platform.

Agents

What is an Agent? An agent is a conversational AI configured to handle voice interactions. Think of it as your AI representative that can conduct natural conversations with callers over the phone or through web interfaces.

Agent Components

Every agent consists of several key components:
  • System Prompt: Instructions that define your agent’s personality, role, and behavior guidelines
  • Language Model (LLM): The AI engine that powers conversation intelligence - OpenAI GPT-4, Groq, xAI Grok, DeepSeek, or custom providers
  • Voice Configuration: Text-to-speech voice, speed, and provider-specific settings (ElevenLabs, Cartesia, Dasha, Inworld, LMNT)
  • Speech Recognition: Automatic speech recognition (ASR) for understanding caller speech - Auto, Deepgram, or Microsoft
  • Tools & Functions: External API integrations your agent can call during conversations
  • Scheduling: Business hours, timezone, and availability settings
  • Advanced Features: Call transfers, post-call analysis, ambient noise handling

Agent Lifecycle States

Agents have a simple enabled/disabled toggle controlled by the isEnabled boolean field: Enabled (isEnabled: true)
  • Agent is active and ready to take calls
  • Default state - new agents are created enabled unless explicitly disabled
  • Configuration changes take effect immediately
  • Available for inbound calls, outbound calls, and web integrations
  • Billable when handling calls
Disabled (isEnabled: false)
  • Temporarily inactive (not deleted)
  • Configuration preserved for future use
  • Not available for any call routing
  • No billing when disabled
New agents default to enabled=true. If you need to test first, either create the agent with isEnabled: false via API, or immediately disable it in the dashboard after creation, then enable when ready for production.

Primary Language Support

BlackBox supports 40 languages:
  • English Variants: en-US, en-GB, en-AU, en-CA
  • European Languages: German, French (France & Canada), Spanish (Spain & Mexico), Portuguese (Brazil & Portugal), Italian, Dutch, Turkish, Polish, Swedish, Bulgarian, Romanian, Czech, Greek, Finnish, Croatian, Slovak, Danish, Ukrainian, Russian, Hungarian, Norwegian
  • Asian Languages: Japanese, Chinese, Korean, Hindi, Indonesian, Filipino, Malay, Tamil, Vietnamese, Thai
  • Arabic: ar-SA, ar-AE
The primary language setting affects speech recognition accuracy and determines available voice options.

Calls

What is a Call? A call represents a single voice interaction between your agent and a caller. Calls can be inbound (received), outbound (initiated by your agent), or test calls made during development.

Call Types

Inbound Calls
  • Calls received by your agent when someone dials your phone number or SIP URI
  • Handled automatically by enabled agents
  • Routed based on agent availability and schedule
  • Support call recording and real-time transcription
Outbound Calls
  • Calls initiated by your agent to phone numbers you specify
  • Scheduled via API or dashboard interface
  • Support single or bulk call scheduling
  • Priority-based queue management with DWRR (Dynamic Weighted Round Robin) scheduling
Test Calls
  • Browser-based calls for testing agent behavior
  • No phone number required
  • Uses WebRTC for audio streaming, WebSocket for signaling
  • Ideal for development and debugging

Call Lifecycle

Every call progresses through distinct states:
  1. Created: Call scheduled but not yet queued
  2. Pending: Call waiting for queue admission
  3. Queued: Call in queue waiting for available resources
  4. Running: Call currently active
  5. Completed: Call finished successfully
  6. Failed: Call encountered an error
  7. Canceled: Call removed from queue before completion

Call Priority and Scheduling

Priority Levels: Integer value where lower numbers = higher urgency (ascending order scheduling)
  • Priority 0: Highest urgency, scheduled first
  • Priority 4: Normal priority (API default when not specified)
  • Priority 7: Default in UI scheduling forms
  • Priority 10: Lowest urgency, scheduled last
The API accepts any integer value for priority - the scheduler processes calls in ascending order (lower numbers first). Call Deadlines
  • Optional ISO 8601 timestamp specifying when to cancel if not started
  • Supports timezone-aware scheduling
  • Triggers deadline webhook if call exceeds timeout
DWRR Scheduling
  • Dynamic Weighted Round Robin algorithm ensures fair resource allocation
  • Balances priority with wait time
  • Prevents priority inversion (low priority calls waiting forever)

Web Integrations

What are Web Integrations? Web integrations allow you to embed your AI agent directly into websites and applications through customizable widgets. Users can interact with your agent via voice or text chat without leaving your site.

Integration Types

Embeddable Widget
  • JavaScript snippet added to your website
  • Customizable appearance (theme and position)
  • Supports voice calls, text chat, or both
  • Feature gating for granular control
API Integration
  • Direct backend integration with your application
  • Token-based authentication
  • Full programmatic control over agent interactions
WebSocket Connection
  • Real-time bidirectional signaling and event streaming
  • Used by widgets for call setup and control messages
  • Audio transport handled by WebRTC (not WebSocket)

Security Features

Origin Restrictions
  • Whitelist specific domains that can use your integration
  • Supports wildcard patterns for subdomains (e.g., *.example.com)
  • Prevents unauthorized access to your agent
Access Tokens
  • Each integration can have multiple tokens
  • Token names help track usage (e.g., “Production Website”, “Staging Environment”)
  • Tokens are always visible in the dashboard - you can copy them anytime
  • Revoke compromised tokens immediately
Feature Gating
  • Enable only the features needed for each integration
  • Available features:
    • AllowWebCall: Voice calls through widget
    • AllowWebChat: Text chat through widget
    • AllowPhoneCall: Phone calls initiated from widget
    • SendCallResult: Send call results back to widget
    • SendToolCallLogs: Send tool execution logs to widget
    • SendTranscriptForAudioCall: Send transcripts to widget
Never use wildcard * for all origins in production. Always specify exact domains or subdomain patterns.

Widget Customization

Theme
  • Light or dark mode
Position
  • bottom-right (most common)
  • bottom-left
  • bottom-center
  • top-right
  • top-left
  • top-center

Webhooks

What are Webhooks? Webhooks provide real-time HTTP notifications when specific call events occur. Your server receives POST requests with detailed payload information, enabling custom integrations and workflows.

Webhook Types

Result Webhook (CompletedWebHookPayload)
  • Triggered when call completes successfully
  • Contains full conversation transcript
  • Includes post-call analysis results (if configured)
  • Most commonly used for logging and analytics
Start Webhook (StartWebHookPayload)
  • Triggered when call begins
  • Requires fallback response if webhook fails
  • Used for dynamic configuration based on caller context
  • Can modify agent behavior per-call
Failed Webhook (FailedWebHookPayload)
  • Triggered when call fails due to error
  • Contains error details and failure reason
  • Useful for alerting and troubleshooting
Deadline Webhook (CallDeadLineWebHookPayload)
  • Triggered when call exceeds configured timeout
  • Indicates call was cancelled before starting
  • Helps identify scheduling or queue issues
Tool Webhook (ToolWebHookPayload)
  • Triggered when agent calls a function/tool
  • Your webhook must return result within timeout
  • Enables real-time API integrations during conversations
  • Supports synchronous external data lookups

Use Cases

  • CRM Integration: Log call results to Salesforce, HubSpot, etc.
  • Custom Analytics: Send call data to your analytics platform
  • Dynamic Routing: Use start webhook to configure agent per-caller
  • External Tool Calls: Fetch customer data, check inventory, book appointments
  • Alerting: Notify team when calls fail or important events occur
Use the /api/v1/webhooks/test endpoint to validate your webhook handlers before deploying to production.

Tools & Functions

What are Tools? Tools enable your agent to perform actions beyond conversation - calling external APIs, looking up data, executing functions. Think of tools as giving your agent “superpowers” to interact with the real world.

Tool Types

Webhook Tools
  • Custom functions you define with JSON schema
  • Agent calls your webhook when tool is needed
  • Webhook returns result to agent
  • Agent incorporates result into conversation
MCP Tools (Model Context Protocol)
  • Connect to MCP-compatible servers
  • Auto-discover available tools from server
  • Standard protocol for tool integration
  • Verify connections before deployment

Tool Configuration

Tool Schema (JSON Schema format)
{
  "type": "object",
  "properties": {
    "location": {
      "type": "string",
      "description": "City name"
    },
    "units": {
      "type": "string",
      "enum": ["celsius", "fahrenheit"]
    }
  },
  "required": ["location"]
}
Tool Description
  • Clear, concise description of what the tool does
  • Agent uses description to decide when to call tool
  • Be specific about expected inputs and outputs
Fallback Result
  • Returned if webhook fails or times out
  • Prevents conversation from breaking on tool failures
  • Should provide graceful degradation

Function Calling Flow

  1. User asks question requiring external data
  2. Agent determines which tool to call based on descriptions
  3. Agent extracts parameters from conversation
  4. BlackBox validates parameters against tool schema
  5. Webhook called with tool name and parameters
  6. Your server processes request and returns result
  7. Agent receives result and incorporates into response
  8. Conversation continues naturally
Tools are optional. Simple conversational agents don’t need tools - only use them when your agent needs to interact with external systems.

Voice Configuration

Voice Components BlackBox uses three core components for voice interactions:

Text-to-Speech (TTS)

Converts agent’s text responses into natural-sounding speech. Supported Providers
ProviderKey FeaturesSpeed RangeSpecialty
ElevenLabsPremium quality, emotion control, multilingual0.7x - 1.2xProfessional voiceovers, voice cloning
CartesiaEmotion presets, sonic model0.0x - 2.0xEmotional range (anger, positivity, surprise, sadness, curiosity)
DashaOptimized for real-time, low latency0.25x - 4.0xConversational AI, fast response
InworldCharacter voices, game-ready0.8x - 1.5xGaming, interactive experiences
LMNTNatural voices, blizzard model1.0x (fixed)Natural conversation flow
Provider-Specific Options ElevenLabs:
  • Similarity Boost (0.0 - 1.0): Voice clarity enhancement
  • Stability (0.0 - 1.0): Consistency of voice characteristics
  • Style (0.0 - 1.0): Expressiveness level
  • Speaker Boost: Enable for clearer output
  • Streaming Latency Optimization (0-4): Higher = faster but lower quality
Cartesia:
  • Emotions: Mix multiple emotion levels
    • Anger: lowest, low, high, highest
    • Positivity: lowest, low, high, highest
    • Surprise: lowest, low, high, highest
    • Sadness: lowest, low, high, highest
    • Curiosity: lowest, low, high, highest
Inworld:
  • Temperature: Voice variation control
  • Pitch: Voice pitch adjustment

Automatic Speech Recognition (ASR)

Converts caller’s speech into text for the LLM to process. Available Options
  • Auto (default): Automatically selects best provider based on language and availability
  • Deepgram: High accuracy, real-time streaming, supports 36+ languages
  • Microsoft: Azure Speech Services, excellent for enterprise scenarios
Use “Auto” ASR unless you have specific provider requirements. The system automatically optimizes for your agent’s primary language.

Voice Cloning

Create custom voices from audio samples. Supported providers: ElevenLabs, Cartesia, and Dasha. Voice Cloning Process
  1. Upload clear audio sample (30 seconds to 10 minutes, high quality)
  2. Provider analyzes and creates voice model
  3. Cloned voice becomes available in voice selector
  4. Test with preview before deploying to agent
  5. Update or delete cloned voices as needed
Best Practices
  • Use high-quality recordings (clear audio, minimal background noise)
  • Single speaker only in sample
  • Consistent tone and speaking style
  • 1-2 minutes of audio recommended for best results (minimum 30 seconds)

Platform Architecture

BlackBox is built on top of the Dasha.ai conversational AI platform, providing enterprise-grade voice AI capabilities through a simple REST API.

Architecture Overview

Integration Points

REST API
  • Full CRUD operations for agents and integrations
  • Create, list, and cancel calls (no update operations)
  • Synchronous request/response for configuration and data retrieval
  • OpenAPI/Swagger specification for all endpoints
Webhooks
  • Real-time event notifications (call started, completed, failed)
  • Tool/function call handlers for agent actions
WebSocket
  • Signaling and event streaming for call monitoring
  • Used alongside WebRTC for real-time voice communication
  • WebRTC handles actual audio transport with low latency
Dasha Platform Benefits
  • Enterprise-grade reliability and scalability
  • Global telephony infrastructure
  • Multi-provider AI orchestration
  • Production-ready voice AI technology
Learn more about the underlying Dasha platform at dasha.ai.

Workflow Overview

The typical BlackBox workflow follows this lifecycle:

1. Build

Create and configure your AI agent with desired personality, voice, and capabilities.
  • Define system prompt and role
  • Select LLM vendor and model
  • Choose TTS voice and provider
  • Configure tools and functions (optional)
  • Set business hours and schedule (optional)
  • Enable advanced features (optional)

2. Test

Test your agent using dashboard testing tools to ensure correct behavior.
  • Use test widget for browser-based calls
  • Review conversation transcripts
  • Debug with developer toolbar
  • Iterate on configuration based on test results

3. Deploy

Connect phone numbers or embed web widgets to make your agent available to users.
  • For Phone: Configure SIP settings, assign phone number
  • For Web: Create web integration, generate token, embed widget
  • For Both: Enable agent and verify routing

4. Monitor

Track performance, analyze call transcripts, and optimize based on real usage data.
  • Review call history and transcripts
  • Analyze agent performance metrics
  • Monitor concurrency and resource usage
  • Optimize based on post-call analysis

Key Benefits

For Developers

  • REST API: Full programmatic control over all platform features
  • Webhooks: Real-time event notifications for custom integrations
  • WebSocket: Low-latency bidirectional communication
  • Scalability: Organization-specific concurrency limits (contact support for capacity planning)
  • Flexibility: Choose from multiple LLM and voice providers
  • Standards: MCP support for tool integrations

For Business Users

  • No-Code Interface: Create agents without programming knowledge
  • Quick Setup: Agent creation through dashboard interface
  • Visual Dashboard: Monitor and manage everything through intuitive UI
  • Multi-Channel: Deploy to phone, web, or both simultaneously
  • Analytics: Built-in performance tracking and call analysis
  • Professional Support: Dedicated support team for enterprise needs

Next Steps

Now that you understand the key concepts, you’re ready to:

API Cross-References