Key Concepts
Understanding these core concepts will help you get the most out of BlackBox’s AI voice agent platform.Agents
What is an Agent? An agent is a conversational AI configured to handle voice interactions. Think of it as your AI representative that can conduct natural conversations with callers over the phone or through web interfaces.Agent Components
Every agent consists of several key components:- System Prompt: Instructions that define your agent’s personality, role, and behavior guidelines
- Language Model (LLM): The AI engine that powers conversation intelligence - OpenAI GPT-4, Groq, xAI Grok, DeepSeek, or custom providers
- Voice Configuration: Text-to-speech voice, speed, and provider-specific settings (ElevenLabs, Cartesia, Dasha, Inworld, LMNT)
- Speech Recognition: Automatic speech recognition (ASR) for understanding caller speech - Auto, Deepgram, or Microsoft
- Tools & Functions: External API integrations your agent can call during conversations
- Scheduling: Business hours, timezone, and availability settings
- Advanced Features: Call transfers, post-call analysis, ambient noise handling
Agent Lifecycle States
Agents have a simple enabled/disabled toggle controlled by theisEnabled boolean field:
Enabled (isEnabled: true)
- Agent is active and ready to take calls
- Default state - new agents are created enabled unless explicitly disabled
- Configuration changes take effect immediately
- Available for inbound calls, outbound calls, and web integrations
- Billable when handling calls
- Temporarily inactive (not deleted)
- Configuration preserved for future use
- Not available for any call routing
- No billing when disabled
Primary Language Support
BlackBox supports 40 languages:- English Variants: en-US, en-GB, en-AU, en-CA
- European Languages: German, French (France & Canada), Spanish (Spain & Mexico), Portuguese (Brazil & Portugal), Italian, Dutch, Turkish, Polish, Swedish, Bulgarian, Romanian, Czech, Greek, Finnish, Croatian, Slovak, Danish, Ukrainian, Russian, Hungarian, Norwegian
- Asian Languages: Japanese, Chinese, Korean, Hindi, Indonesian, Filipino, Malay, Tamil, Vietnamese, Thai
- Arabic: ar-SA, ar-AE
Calls
What is a Call? A call represents a single voice interaction between your agent and a caller. Calls can be inbound (received), outbound (initiated by your agent), or test calls made during development.Call Types
Inbound Calls- Calls received by your agent when someone dials your phone number or SIP URI
- Handled automatically by enabled agents
- Routed based on agent availability and schedule
- Support call recording and real-time transcription
- Calls initiated by your agent to phone numbers you specify
- Scheduled via API or dashboard interface
- Support single or bulk call scheduling
- Priority-based queue management with DWRR (Dynamic Weighted Round Robin) scheduling
- Browser-based calls for testing agent behavior
- No phone number required
- Uses WebRTC for audio streaming, WebSocket for signaling
- Ideal for development and debugging
Call Lifecycle
Every call progresses through distinct states:- Created: Call scheduled but not yet queued
- Pending: Call waiting for queue admission
- Queued: Call in queue waiting for available resources
- Running: Call currently active
- Completed: Call finished successfully
- Failed: Call encountered an error
- Canceled: Call removed from queue before completion
Call Priority and Scheduling
Priority Levels: Integer value where lower numbers = higher urgency (ascending order scheduling)- Priority 0: Highest urgency, scheduled first
- Priority 4: Normal priority (API default when not specified)
- Priority 7: Default in UI scheduling forms
- Priority 10: Lowest urgency, scheduled last
- Optional ISO 8601 timestamp specifying when to cancel if not started
- Supports timezone-aware scheduling
- Triggers deadline webhook if call exceeds timeout
- Dynamic Weighted Round Robin algorithm ensures fair resource allocation
- Balances priority with wait time
- Prevents priority inversion (low priority calls waiting forever)
Web Integrations
What are Web Integrations? Web integrations allow you to embed your AI agent directly into websites and applications through customizable widgets. Users can interact with your agent via voice or text chat without leaving your site.Integration Types
Embeddable Widget- JavaScript snippet added to your website
- Customizable appearance (theme and position)
- Supports voice calls, text chat, or both
- Feature gating for granular control
- Direct backend integration with your application
- Token-based authentication
- Full programmatic control over agent interactions
- Real-time bidirectional signaling and event streaming
- Used by widgets for call setup and control messages
- Audio transport handled by WebRTC (not WebSocket)
Security Features
Origin Restrictions- Whitelist specific domains that can use your integration
- Supports wildcard patterns for subdomains (e.g.,
*.example.com) - Prevents unauthorized access to your agent
- Each integration can have multiple tokens
- Token names help track usage (e.g., “Production Website”, “Staging Environment”)
- Tokens are always visible in the dashboard - you can copy them anytime
- Revoke compromised tokens immediately
- Enable only the features needed for each integration
- Available features:
AllowWebCall: Voice calls through widgetAllowWebChat: Text chat through widgetAllowPhoneCall: Phone calls initiated from widgetSendCallResult: Send call results back to widgetSendToolCallLogs: Send tool execution logs to widgetSendTranscriptForAudioCall: Send transcripts to widget
Widget Customization
Theme- Light or dark mode
- bottom-right (most common)
- bottom-left
- bottom-center
- top-right
- top-left
- top-center
Webhooks
What are Webhooks? Webhooks provide real-time HTTP notifications when specific call events occur. Your server receives POST requests with detailed payload information, enabling custom integrations and workflows.Webhook Types
Result Webhook (CompletedWebHookPayload)
- Triggered when call completes successfully
- Contains full conversation transcript
- Includes post-call analysis results (if configured)
- Most commonly used for logging and analytics
StartWebHookPayload)
- Triggered when call begins
- Requires fallback response if webhook fails
- Used for dynamic configuration based on caller context
- Can modify agent behavior per-call
FailedWebHookPayload)
- Triggered when call fails due to error
- Contains error details and failure reason
- Useful for alerting and troubleshooting
CallDeadLineWebHookPayload)
- Triggered when call exceeds configured timeout
- Indicates call was cancelled before starting
- Helps identify scheduling or queue issues
ToolWebHookPayload)
- Triggered when agent calls a function/tool
- Your webhook must return result within timeout
- Enables real-time API integrations during conversations
- Supports synchronous external data lookups
Use Cases
- CRM Integration: Log call results to Salesforce, HubSpot, etc.
- Custom Analytics: Send call data to your analytics platform
- Dynamic Routing: Use start webhook to configure agent per-caller
- External Tool Calls: Fetch customer data, check inventory, book appointments
- Alerting: Notify team when calls fail or important events occur
Tools & Functions
What are Tools? Tools enable your agent to perform actions beyond conversation - calling external APIs, looking up data, executing functions. Think of tools as giving your agent “superpowers” to interact with the real world.Tool Types
Webhook Tools- Custom functions you define with JSON schema
- Agent calls your webhook when tool is needed
- Webhook returns result to agent
- Agent incorporates result into conversation
- Connect to MCP-compatible servers
- Auto-discover available tools from server
- Standard protocol for tool integration
- Verify connections before deployment
Tool Configuration
Tool Schema (JSON Schema format)- Clear, concise description of what the tool does
- Agent uses description to decide when to call tool
- Be specific about expected inputs and outputs
- Returned if webhook fails or times out
- Prevents conversation from breaking on tool failures
- Should provide graceful degradation
Function Calling Flow
- User asks question requiring external data
- Agent determines which tool to call based on descriptions
- Agent extracts parameters from conversation
- BlackBox validates parameters against tool schema
- Webhook called with tool name and parameters
- Your server processes request and returns result
- Agent receives result and incorporates into response
- Conversation continues naturally
Tools are optional. Simple conversational agents don’t need tools - only use them when your agent needs to interact with external systems.
Voice Configuration
Voice Components BlackBox uses three core components for voice interactions:Text-to-Speech (TTS)
Converts agent’s text responses into natural-sounding speech. Supported Providers| Provider | Key Features | Speed Range | Specialty |
|---|---|---|---|
| ElevenLabs | Premium quality, emotion control, multilingual | 0.7x - 1.2x | Professional voiceovers, voice cloning |
| Cartesia | Emotion presets, sonic model | 0.0x - 2.0x | Emotional range (anger, positivity, surprise, sadness, curiosity) |
| Dasha | Optimized for real-time, low latency | 0.25x - 4.0x | Conversational AI, fast response |
| Inworld | Character voices, game-ready | 0.8x - 1.5x | Gaming, interactive experiences |
| LMNT | Natural voices, blizzard model | 1.0x (fixed) | Natural conversation flow |
- Similarity Boost (0.0 - 1.0): Voice clarity enhancement
- Stability (0.0 - 1.0): Consistency of voice characteristics
- Style (0.0 - 1.0): Expressiveness level
- Speaker Boost: Enable for clearer output
- Streaming Latency Optimization (0-4): Higher = faster but lower quality
- Emotions: Mix multiple emotion levels
- Anger: lowest, low, high, highest
- Positivity: lowest, low, high, highest
- Surprise: lowest, low, high, highest
- Sadness: lowest, low, high, highest
- Curiosity: lowest, low, high, highest
- Temperature: Voice variation control
- Pitch: Voice pitch adjustment
Automatic Speech Recognition (ASR)
Converts caller’s speech into text for the LLM to process. Available Options- Auto (default): Automatically selects best provider based on language and availability
- Deepgram: High accuracy, real-time streaming, supports 36+ languages
- Microsoft: Azure Speech Services, excellent for enterprise scenarios
Voice Cloning
Create custom voices from audio samples. Supported providers: ElevenLabs, Cartesia, and Dasha. Voice Cloning Process- Upload clear audio sample (30 seconds to 10 minutes, high quality)
- Provider analyzes and creates voice model
- Cloned voice becomes available in voice selector
- Test with preview before deploying to agent
- Update or delete cloned voices as needed
- Use high-quality recordings (clear audio, minimal background noise)
- Single speaker only in sample
- Consistent tone and speaking style
- 1-2 minutes of audio recommended for best results (minimum 30 seconds)
Platform Architecture
BlackBox is built on top of the Dasha.ai conversational AI platform, providing enterprise-grade voice AI capabilities through a simple REST API.Architecture Overview
Integration Points
REST API- Full CRUD operations for agents and integrations
- Create, list, and cancel calls (no update operations)
- Synchronous request/response for configuration and data retrieval
- OpenAPI/Swagger specification for all endpoints
- Real-time event notifications (call started, completed, failed)
- Tool/function call handlers for agent actions
- Signaling and event streaming for call monitoring
- Used alongside WebRTC for real-time voice communication
- WebRTC handles actual audio transport with low latency
- Enterprise-grade reliability and scalability
- Global telephony infrastructure
- Multi-provider AI orchestration
- Production-ready voice AI technology
Workflow Overview
The typical BlackBox workflow follows this lifecycle:1. Build
Create and configure your AI agent with desired personality, voice, and capabilities.- Define system prompt and role
- Select LLM vendor and model
- Choose TTS voice and provider
- Configure tools and functions (optional)
- Set business hours and schedule (optional)
- Enable advanced features (optional)
2. Test
Test your agent using dashboard testing tools to ensure correct behavior.- Use test widget for browser-based calls
- Review conversation transcripts
- Debug with developer toolbar
- Iterate on configuration based on test results
3. Deploy
Connect phone numbers or embed web widgets to make your agent available to users.- For Phone: Configure SIP settings, assign phone number
- For Web: Create web integration, generate token, embed widget
- For Both: Enable agent and verify routing
4. Monitor
Track performance, analyze call transcripts, and optimize based on real usage data.- Review call history and transcripts
- Analyze agent performance metrics
- Monitor concurrency and resource usage
- Optimize based on post-call analysis
Key Benefits
For Developers
- REST API: Full programmatic control over all platform features
- Webhooks: Real-time event notifications for custom integrations
- WebSocket: Low-latency bidirectional communication
- Scalability: Organization-specific concurrency limits (contact support for capacity planning)
- Flexibility: Choose from multiple LLM and voice providers
- Standards: MCP support for tool integrations
For Business Users
- No-Code Interface: Create agents without programming knowledge
- Quick Setup: Agent creation through dashboard interface
- Visual Dashboard: Monitor and manage everything through intuitive UI
- Multi-Channel: Deploy to phone, web, or both simultaneously
- Analytics: Built-in performance tracking and call analysis
- Professional Support: Dedicated support team for enterprise needs
Next Steps
Now that you understand the key concepts, you’re ready to:- Create Your First Agent - Step-by-step agent creation guide
- Quick Start Guide - Get up and running in 5 minutes
- Test Your Agent - Learn testing tools and best practices
- Deploy Your Agent - Make your agent available to users
API Cross-References
- GET /api/v1/agents - List all agents
- POST /api/v1/agents - Create new agent
- POST /api/v1/calls - Schedule outbound call
- GET /api/v1/voice - List available voices
- POST /api/v1/web-integrations - Create web integration
- POST /api/v1/webhooks/test - Test webhooks