Key Concepts

Understanding these core concepts will help you get the most out of BlackBox’s AI voice agent platform.

Agents

What is an Agent? An agent is a conversational AI configured to handle voice interactions. Think of it as your AI representative that can conduct natural conversations with callers over the phone or through web interfaces.

Agent Components

Every agent consists of several key components:

System Prompt: Instructions that define your agent’s personality, role, and behavior guidelines
Language Model (LLM): The AI engine that powers conversation intelligence - OpenAI GPT-4, Groq, xAI Grok, DeepSeek, or custom providers
Voice Configuration: Text-to-speech voice, speed, and provider-specific settings (ElevenLabs, Cartesia, Dasha, Inworld, LMNT)
Speech Recognition: Automatic speech recognition (ASR) for understanding caller speech - Auto, Deepgram, or Microsoft
Tools & Functions: External API integrations your agent can call during conversations
Scheduling: Business hours, timezone, and availability settings
Advanced Features: Call transfers, post-call analysis, ambient noise handling

Agent Lifecycle States

Agents have a simple enabled/disabled toggle controlled by the isEnabled boolean field: Enabled (isEnabled: true)

Agent is active and ready to take calls
Default state - new agents are created enabled unless explicitly disabled
Configuration changes take effect immediately
Available for inbound calls, outbound calls, and web integrations
Billable when handling calls

Disabled (isEnabled: false)

Temporarily inactive (not deleted)
Configuration preserved for future use
Not available for any call routing
No billing when disabled

New agents default to enabled=true. If you need to test first, either create the agent with isEnabled: false via API, or immediately disable it in the dashboard after creation, then enable when ready for production.

Primary Language Support

BlackBox supports 40 languages:

English Variants: en-US, en-GB, en-AU, en-CA
European Languages: German, French (France & Canada), Spanish (Spain & Mexico), Portuguese (Brazil & Portugal), Italian, Dutch, Turkish, Polish, Swedish, Bulgarian, Romanian, Czech, Greek, Finnish, Croatian, Slovak, Danish, Ukrainian, Russian, Hungarian, Norwegian
Asian Languages: Japanese, Chinese, Korean, Hindi, Indonesian, Filipino, Malay, Tamil, Vietnamese, Thai
Arabic: ar-SA, ar-AE

The primary language setting affects speech recognition accuracy and determines available voice options.

Calls

What is a Call? A call represents a single voice interaction between your agent and a caller. Calls can be inbound (received), outbound (initiated by your agent), or test calls made during development.

Call Types

Inbound Calls

Calls received by your agent when someone dials your phone number or SIP URI
Handled automatically by enabled agents
Routed based on agent availability and schedule
Support call recording and real-time transcription

Outbound Calls

Calls initiated by your agent to phone numbers you specify
Scheduled via API or dashboard interface
Support single or bulk call scheduling
Priority-based queue management with DWRR (Dynamic Weighted Round Robin) scheduling

Test Calls

Browser-based calls for testing agent behavior
No phone number required
Uses WebRTC for audio streaming, WebSocket for signaling
Ideal for development and debugging

Call Lifecycle

Every call progresses through distinct states:

Created: Call scheduled but not yet queued
Pending: Call waiting for queue admission
Queued: Call in queue waiting for available resources
Running: Call currently active
Completed: Call finished successfully
Failed: Call encountered an error
Canceled: Call removed from queue before completion

Call Priority and Scheduling

Priority Levels: Integer value where lower numbers = higher urgency (ascending order scheduling)

Priority 0: Highest urgency, scheduled first
Priority 4: Normal priority (API default when not specified)
Priority 7: Default in UI scheduling forms
Priority 10: Lowest urgency, scheduled last

The API accepts any integer value for priority - the scheduler processes calls in ascending order (lower numbers first). Call Deadlines

Optional ISO 8601 timestamp specifying when to cancel if not started
Supports timezone-aware scheduling
Triggers deadline webhook if call exceeds timeout

DWRR Scheduling

Dynamic Weighted Round Robin algorithm ensures fair resource allocation
Balances priority with wait time
Prevents priority inversion (low priority calls waiting forever)

Web Integrations

What are Web Integrations? Web integrations allow you to embed your AI agent directly into websites and applications through customizable widgets. Users can interact with your agent via voice or text chat without leaving your site.

Integration Types

Embeddable Widget

JavaScript snippet added to your website
Customizable appearance (theme and position)
Supports voice calls, text chat, or both
Feature gating for granular control

API Integration

Direct backend integration with your application
Token-based authentication
Full programmatic control over agent interactions

WebSocket Connection

Real-time bidirectional signaling and event streaming
Used by widgets for call setup and control messages
Audio transport handled by WebRTC (not WebSocket)

Security Features

Origin Restrictions

Whitelist specific domains that can use your integration
Supports wildcard patterns for subdomains (e.g., *.example.com)
Prevents unauthorized access to your agent

Access Tokens

Each integration can have multiple tokens
Token names help track usage (e.g., “Production Website”, “Staging Environment”)
Tokens are always visible in the dashboard - you can copy them anytime
Revoke compromised tokens immediately

Feature Gating

Enable only the features needed for each integration
Available features:
- AllowWebCall: Voice calls through widget
- AllowWebChat: Text chat through widget
- AllowPhoneCall: Phone calls initiated from widget
- SendCallResult: Send call results back to widget
- SendToolCallLogs: Send tool execution logs to widget
- SendTranscriptForAudioCall: Send transcripts to widget

Never use wildcard * for all origins in production. Always specify exact domains or subdomain patterns.

Theme

Light or dark mode

Position

bottom-right (most common)
bottom-left
bottom-center
top-right
top-left
top-center

Webhooks

What are Webhooks? Webhooks provide real-time HTTP notifications when specific call events occur. Your server receives POST requests with detailed payload information, enabling custom integrations and workflows.

Webhook Types

Result Webhook (CompletedWebHookPayload)

Triggered when call completes successfully
Contains full conversation transcript
Includes post-call analysis results (if configured)
Most commonly used for logging and analytics

Start Webhook (StartWebHookPayload)

Triggered when call begins
Requires fallback response if webhook fails
Used for dynamic configuration based on caller context
Can modify agent behavior per-call

Failed Webhook (FailedWebHookPayload)

Triggered when call fails due to error
Contains error details and failure reason
Useful for alerting and troubleshooting

Deadline Webhook (CallDeadLineWebHookPayload)

Triggered when call exceeds configured timeout
Indicates call was cancelled before starting
Helps identify scheduling or queue issues

Tool Webhook (ToolWebHookPayload)

Triggered when agent calls a function/tool
Your webhook must return result within timeout
Enables real-time API integrations during conversations
Supports synchronous external data lookups

Use Cases

CRM Integration: Log call results to Salesforce, HubSpot, etc.
Custom Analytics: Send call data to your analytics platform
Dynamic Routing: Use start webhook to configure agent per-caller
External Tool Calls: Fetch customer data, check inventory, book appointments
Alerting: Notify team when calls fail or important events occur

Use the /api/v1/webhooks/test endpoint to validate your webhook handlers before deploying to production.

Tools & Functions

What are Tools? Tools enable your agent to perform actions beyond conversation - calling external APIs, looking up data, executing functions. Think of tools as giving your agent “superpowers” to interact with the real world.

Tool Types

Webhook Tools

Custom functions you define with JSON schema
Agent calls your webhook when tool is needed
Webhook returns result to agent
Agent incorporates result into conversation

MCP Tools (Model Context Protocol)

Connect to MCP-compatible servers
Auto-discover available tools from server
Standard protocol for tool integration
Verify connections before deployment

Tool Configuration

Tool Schema (JSON Schema format)

{
  "type": "object",
  "properties": {
    "location": {
      "type": "string",
      "description": "City name"
    },
    "units": {
      "type": "string",
      "enum": ["celsius", "fahrenheit"]
    }
  },
  "required": ["location"]
}

Tool Description

Clear, concise description of what the tool does
Agent uses description to decide when to call tool
Be specific about expected inputs and outputs

Fallback Result

Returned if webhook fails or times out
Prevents conversation from breaking on tool failures
Should provide graceful degradation

Function Calling Flow

User asks question requiring external data
Agent determines which tool to call based on descriptions
Agent extracts parameters from conversation
BlackBox validates parameters against tool schema
Webhook called with tool name and parameters
Your server processes request and returns result
Agent receives result and incorporates into response
Conversation continues naturally

Tools are optional. Simple conversational agents don’t need tools - only use them when your agent needs to interact with external systems.

Voice Configuration

Voice Components BlackBox uses three core components for voice interactions:

Text-to-Speech (TTS)

Converts agent’s text responses into natural-sounding speech. Supported Providers

Provider	Key Features	Speed Range	Specialty
ElevenLabs	Premium quality, emotion control, multilingual	0.7x - 1.2x	Professional voiceovers, voice cloning
Cartesia	Emotion presets, sonic model	0.0x - 2.0x	Emotional range (anger, positivity, surprise, sadness, curiosity)
Dasha	Optimized for real-time, low latency	0.25x - 4.0x	Conversational AI, fast response
Inworld	Character voices, game-ready	0.8x - 1.5x	Gaming, interactive experiences
LMNT	Natural voices, blizzard model	1.0x (fixed)	Natural conversation flow

Provider-Specific Options ElevenLabs:

Similarity Boost (0.0 - 1.0): Voice clarity enhancement
Stability (0.0 - 1.0): Consistency of voice characteristics
Style (0.0 - 1.0): Expressiveness level
Speaker Boost: Enable for clearer output
Streaming Latency Optimization (0-4): Higher = faster but lower quality

Cartesia:

Emotions: Mix multiple emotion levels
- Anger: lowest, low, high, highest
- Positivity: lowest, low, high, highest
- Surprise: lowest, low, high, highest
- Sadness: lowest, low, high, highest
- Curiosity: lowest, low, high, highest

Inworld:

Temperature: Voice variation control
Pitch: Voice pitch adjustment

Automatic Speech Recognition (ASR)

Converts caller’s speech into text for the LLM to process. Available Options

Auto (default): Automatically selects best provider based on language and availability
Deepgram: High accuracy, real-time streaming, supports 36+ languages
Microsoft: Azure Speech Services, excellent for enterprise scenarios

Use “Auto” ASR unless you have specific provider requirements. The system automatically optimizes for your agent’s primary language.

Voice Cloning

Create custom voices from audio samples. Supported providers: ElevenLabs, Cartesia, and Dasha. Voice Cloning Process

Upload clear audio sample (30 seconds to 10 minutes, high quality)
Provider analyzes and creates voice model
Cloned voice becomes available in voice selector
Test with preview before deploying to agent
Update or delete cloned voices as needed

Best Practices

Use high-quality recordings (clear audio, minimal background noise)
Single speaker only in sample
Consistent tone and speaking style
1-2 minutes of audio recommended for best results (minimum 30 seconds)

Platform Architecture

BlackBox is built on top of the Dasha.ai conversational AI platform, providing enterprise-grade voice AI capabilities through a simple REST API.

Architecture Overview

Integration Points

REST API

Full CRUD operations for agents and integrations
Create, list, and cancel calls (no update operations)
Synchronous request/response for configuration and data retrieval
OpenAPI/Swagger specification for all endpoints

Webhooks

Real-time event notifications (call started, completed, failed)
Tool/function call handlers for agent actions

WebSocket

Signaling and event streaming for call monitoring
Used alongside WebRTC for real-time voice communication
WebRTC handles actual audio transport with low latency

Dasha Platform Benefits

Enterprise-grade reliability and scalability
Global telephony infrastructure
Multi-provider AI orchestration
Production-ready voice AI technology

Learn more about the underlying Dasha platform at dasha.ai.

Workflow Overview

The typical BlackBox workflow follows this lifecycle:

1. Build

Create and configure your AI agent with desired personality, voice, and capabilities.

Define system prompt and role
Select LLM vendor and model
Choose TTS voice and provider
Configure tools and functions (optional)
Set business hours and schedule (optional)
Enable advanced features (optional)

2. Test

Test your agent using dashboard testing tools to ensure correct behavior.

Use test widget for browser-based calls
Review conversation transcripts
Debug with developer toolbar
Iterate on configuration based on test results

3. Deploy

Connect phone numbers or embed web widgets to make your agent available to users.

For Phone: Configure SIP settings, assign phone number
For Web: Create web integration, generate token, embed widget
For Both: Enable agent and verify routing

4. Monitor

Track performance, analyze call transcripts, and optimize based on real usage data.

Review call history and transcripts
Analyze agent performance metrics
Monitor concurrency and resource usage
Optimize based on post-call analysis

Key Benefits

For Developers

REST API: Full programmatic control over all platform features
Webhooks: Real-time event notifications for custom integrations
WebSocket: Low-latency bidirectional communication
Scalability: Organization-specific concurrency limits (contact support for capacity planning)
Flexibility: Choose from multiple LLM and voice providers
Standards: MCP support for tool integrations

For Business Users

No-Code Interface: Create agents without programming knowledge
Quick Setup: Agent creation through dashboard interface
Visual Dashboard: Monitor and manage everything through intuitive UI
Multi-Channel: Deploy to phone, web, or both simultaneously
Analytics: Built-in performance tracking and call analysis
Professional Support: Dedicated support team for enterprise needs

Next Steps

Now that you understand the key concepts, you’re ready to:

Create Your First Agent - Step-by-step agent creation guide
Quick Start Guide - Get up and running in 5 minutes
Test Your Agent - Learn testing tools and best practices
Deploy Your Agent - Make your agent available to users

API Cross-References

GET /api/v1/agents - List all agents
POST /api/v1/agents - Create new agent
POST /api/v1/calls - Schedule outbound call
GET /api/v1/voice - List available voices
POST /api/v1/web-integrations - Create web integration
POST /api/v1/webhooks/test - Test webhooks

Introduction

Build

WebSockets

​Key Concepts

​Agents

​Agent Components

​Agent Lifecycle States

​Primary Language Support

​Calls

​Call Types

​Call Lifecycle

​Call Priority and Scheduling

​Web Integrations

​Integration Types

​Security Features

​Widget Customization

​Webhooks

​Webhook Types

​Use Cases

​Tools & Functions

​Tool Types

​Tool Configuration

​Function Calling Flow

​Voice Configuration

​Text-to-Speech (TTS)

​Automatic Speech Recognition (ASR)

​Voice Cloning

​Platform Architecture

​Architecture Overview

​Integration Points

​Workflow Overview

​1. Build

​2. Test

​3. Deploy

​4. Monitor

​Key Benefits

​For Developers

​For Business Users

​Next Steps

​API Cross-References

Key Concepts

Agents

Agent Components

Agent Lifecycle States

Primary Language Support

Calls

Call Types

Call Lifecycle

Call Priority and Scheduling

Web Integrations

Integration Types

Security Features

Widget Customization

Webhooks

Webhook Types

Use Cases

Tools & Functions

Tool Types

Tool Configuration

Function Calling Flow

Voice Configuration

Text-to-Speech (TTS)

Automatic Speech Recognition (ASR)

Voice Cloning

Platform Architecture

Architecture Overview

Integration Points

Workflow Overview

1. Build

2. Test

3. Deploy

4. Monitor

Key Benefits

For Developers

For Business Users

Next Steps

API Cross-References