LLM Configuration

The Large Language Model (LLM) is the brain of your AI agent. It processes conversations, generates responses, and makes decisions based on your system prompt and configuration. BlackBox supports multiple LLM vendors, each offering different models with varying capabilities, speeds, and cost structures.

Supported LLM Vendors

BlackBox integrates with 5 major LLM providers:

Vendor	Best For	Context Window	Key Advantage
OpenAI	Production use, reliability	128k tokens	Industry-leading quality and reasoning
Groq	Ultra-fast inference	32k tokens	Fastest response times available
Grok (xAI)	Advanced reasoning	128k tokens	Latest AI innovations from xAI
DeepSeek	Cost-efficiency	128k tokens	30x more cost-efficient than GPT-4
Custom Compatible	Self-hosted models	Varies	Full control and customization

OpenAI

OpenAI provides the most widely-used and battle-tested LLMs for conversational AI.

Available Models

GPT-4.1 Series (Latest)

gpt-4.1

Latest flagship model with enhanced reasoning
Context Window: 128k tokens
Best For: Complex conversations requiring deep understanding
Use Cases: Customer support, sales qualification, medical assistance
Cost Tier: Premium

gpt-4.1-mini

Faster, more cost-effective version of GPT-4.1
Context Window: 128k tokens
Best For: High-volume applications with balanced quality/cost
Use Cases: Lead qualification, appointment scheduling, FAQs
Cost Tier: Mid

gpt-4.1-nano

Ultra-fast with minimal latency
Context Window: 128k tokens
Best For: Real-time voice conversations requiring instant responses
Use Cases: Quick Q&A, simple routing, basic information gathering
Cost Tier: Low

GPT-4o Series (Multimodal)

gpt-4o

Multimodal model supporting vision and audio
Context Window: 128k tokens
Best For: Applications requiring image understanding
Use Cases: Visual product support, document analysis
Cost Tier: Premium

gpt-4o-mini

Affordable multimodal capabilities
Context Window: 128k tokens
Best For: Cost-conscious multimodal applications
Cost Tier: Mid

Reasoning Models

o3-mini

Specialized reasoning model
Context Window: 128k tokens
Best For: Complex problem-solving and logical reasoning
Use Cases: Technical troubleshooting, decision trees
Cost Tier: Premium

OpenAI-Specific Options

Service Tier

OpenAI offers a priority service tier for lower latency at potentially higher cost. This is controlled via a checkbox in the UI labeled “Use priority tier (lower latency)”. When enabled, the priority tier is set in vendorSpecificOptions:

const agent = {
  config: {
    llmConfig: {
      vendor: "openai",
      model: "gpt-4.1",
      vendorSpecificOptions: {
        service_tier: "priority"
      }
    }
  }
};

Behavior:

Enabled (checkbox checked): Sets vendorSpecificOptions.service_tier to "priority"
- Lower latency and higher request priority
- May increase API costs
- Recommended for latency-sensitive production agents
Disabled (checkbox unchecked): The service_tier field is omitted entirely
- Standard OpenAI behavior (default tier)
- Best for cost-conscious applications

The priority tier is OpenAI-specific and only available when using OpenAI as your LLM vendor. If you switch vendors, this setting is automatically cleared.

Configuration Example

const openaiAgent = await fetch('https://blackbox.dasha.ai/api/v1/agents', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    name: "Customer Support Agent",
    config: {
      primaryLanguage: "en-US",
      llmConfig: {
        vendor: "openai",
        model: "gpt-4.1-mini",
        prompt: "You are a helpful customer support agent for Acme Corp. Be professional, empathetic, and concise.",
        options: {
          temperature: 0.7,
          maxTokens: 1000,
          topP: 0.9
        },
        vendorSpecificOptions: {
          service_tier: "priority"  // Optional: enable for lower latency
        }
      },
      ttsConfig: { /* ... */ }
    }
  })
});

Groq

Groq delivers the fastest LLM inference speeds, ideal for real-time voice applications.

Available Models

llama-3.3-70b-versatile

Production-ready model with broad capabilities
Context Window: 32k tokens
Best For: General-purpose voice agents
Speed: Extremely fast inference
Use Cases: Any voice application requiring instant responses

llama-3.1-8b-instant

Ultra-fast inference for simple tasks
Context Window: 32k tokens
Best For: High-volume, straightforward conversations
Speed: Fastest available
Use Cases: Quick routing, basic Q&A, simple interactions

deepseek-r1-distill-llama-70b

Reasoning-enhanced model
Context Window: 32k tokens
Best For: Decision-making and logical reasoning
Use Cases: Technical support, troubleshooting

gemma2-9b-it

Instruction-following specialist
Context Window: 32k tokens
Best For: Structured conversations with clear workflows
Use Cases: Appointment booking, form filling

qwen-2.5-coder-32b

Code-specialized model
Context Window: 32k tokens
Best For: Technical conversations and code discussion
Use Cases: Developer support, API assistance

Configuration Example

const groqAgent = await fetch('https://blackbox.dasha.ai/api/v1/agents', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    name: "Fast Response Agent",
    config: {
      primaryLanguage: "en-US",
      llmConfig: {
        vendor: "groq",
        model: "llama-3.3-70b-versatile",
        prompt: "You are a quick and efficient assistant. Provide direct, concise answers.",
        options: {
          temperature: 0.6,
          maxTokens: 800,
          topP: 0.95
        }
      },
      ttsConfig: { /* ... */ }
    }
  })
});

Groq excels at low-latency responses. Pair it with fast TTS providers like Cartesia or Dasha for the quickest possible conversations.

Grok (xAI)

Grok models from xAI provide cutting-edge reasoning and conversational capabilities.

Available Models

grok-2

Latest flagship model with enhanced reasoning
Context Window: 128k tokens
Best For: Complex reasoning and nuanced conversations
Use Cases: Advisory roles, complex customer issues

grok-2-mini

Faster, cost-effective version
Context Window: 128k tokens
Best For: Balanced performance and cost
Use Cases: General-purpose voice agents

grok-3-mini

Latest mini model with improved reasoning
Context Window: 128k tokens
Best For: Production agents requiring strong reasoning at lower cost
Use Cases: Sales, support, complex routing

Configuration Example

const grokAgent = await fetch('https://blackbox.dasha.ai/api/v1/agents', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    name: "Advisory Agent",
    config: {
      primaryLanguage: "en-US",
      llmConfig: {
        vendor: "grok",
        model: "grok-2-mini",
        prompt: "You are a knowledgeable advisor. Provide thoughtful, well-reasoned guidance.",
        options: {
          temperature: 0.8,
          maxTokens: 1200
        }
      },
      ttsConfig: { /* ... */ }
    }
  })
});

DeepSeek

DeepSeek offers breakthrough cost-efficiency while maintaining GPT-4 level quality.

Available Models

deepseek-r1

Breakthrough reasoning model
Context Window: 128k tokens
Cost Efficiency: 30x more cost-efficient than GPT-4
Best For: Budget-conscious production deployments
Use Cases: Any application requiring GPT-4 quality at lower cost
Notable Feature: Advanced reasoning capabilities

deepseek-v3

GPT-4 equivalent performance
Context Window: 128k tokens
Best For: High-quality conversations at reduced cost
Use Cases: Customer support, sales, complex interactions

Configuration Example

const deepseekAgent = await fetch('https://blackbox.dasha.ai/api/v1/agents', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    name: "Cost-Efficient Agent",
    config: {
      primaryLanguage: "en-US",
      llmConfig: {
        vendor: "deepseek",
        model: "deepseek-r1",
        prompt: "You are an intelligent assistant focused on solving problems efficiently.",
        options: {
          temperature: 0.7,
          maxTokens: 1000,
          topP: 0.9
        }
      },
      ttsConfig: { /* ... */ }
    }
  })
});

DeepSeek’s 30x cost advantage makes it perfect for high-volume applications. Test it against OpenAI for your use case - you may find comparable quality at significantly lower cost.

Custom Compatible Provider

Use any OpenAI-compatible API endpoint, including self-hosted models or alternative providers.

When to Use Custom Providers

Self-hosted models for data privacy
Alternative providers with OpenAI-compatible APIs
Custom fine-tuned models
On-premise deployments

Required Configuration

Custom providers require additional configuration:

const customAgent = await fetch('https://blackbox.dasha.ai/api/v1/agents', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    name: "Self-Hosted Agent",
    config: {
      primaryLanguage: "en-US",
      llmConfig: {
        vendor: "customCompatible",
        model: "your-model-name",
        endpoint: "https://api.yourprovider.com/v1",
        apiKey: "your-provider-api-key",
        prompt: "You are a custom AI assistant.",
        options: {
          temperature: 0.7,
          maxTokens: 1000
        }
      },
      ttsConfig: { /* ... */ }
    }
  })
});

Custom Provider Fields

Endpoint URL (Required)

Full URL to OpenAI-compatible API
Must support /chat/completions endpoint
Format: https://api.example.com/v1
Validation: Must be valid HTTPS URL

API Key (Required)

Authentication key for your custom provider
Minimum 10 characters
Stored securely, never exposed in responses

Model ID (Required)

Model identifier as expected by your provider
Can be any string recognized by your endpoint
Example: llama-2-70b, custom-gpt-4, fine-tuned-model-v2

Custom providers must implement the OpenAI Chat Completions API format. Incompatible APIs will cause agent failures. Test thoroughly before production use.

LLM Parameters

All LLM vendors support standard configuration parameters that control response behavior.

Temperature

Controls randomness and creativity in responses. Range: 0.0 to 2.0 Default: Varies by vendor (typically 0.7-1.0) Recommended: 0.6-0.8 for voice agents How Temperature Works:

Low (0.0-0.5): Focused, deterministic, consistent
- Use for: FAQs, factual information, structured workflows
- Example: “What are your business hours?” → Same answer every time
Medium (0.6-0.9): Balanced creativity and consistency
- Use for: General conversation, customer support
- Example: Friendly greetings with natural variation
High (1.0-2.0): Creative, varied, unpredictable
- Use for: Storytelling, brainstorming (rarely for voice agents)
- Warning: May produce hallucinations or inconsistent information

// Conservative agent for factual responses
llmConfig: {
  options: {
    temperature: 0.3  // Very focused and consistent
  }
}

// Conversational agent with natural variation
llmConfig: {
  options: {
    temperature: 0.7  // Balanced and natural
  }
}

// Creative agent (use cautiously)
llmConfig: {
  options: {
    temperature: 1.2  // More creative but less predictable
  }
}

For production voice agents, we recommend temperature between 0.6-0.8. Lower values feel robotic, higher values risk inconsistency.

Max Tokens

Limits the maximum length of the LLM’s response. Type: Positive integer Default: Varies by model (often 2048-4096) Recommended: 500-1000 for voice agents Why Limit Tokens:

Cost Control: Reduce token usage and API costs
Conciseness: Force agent to be brief (important for voice)
Performance: Faster response generation
User Experience: Avoid long-winded voice responses

Token Estimation:

~4 characters per token (English)
~1 word = 1.3 tokens (average)
100 tokens ≈ 75 words ≈ 300 characters

// Brief responses for quick interactions
llmConfig: {
  options: {
    maxTokens: 150  // ~100 words, 20-30 seconds of speech
  }
}

// Standard conversation
llmConfig: {
  options: {
    maxTokens: 500  // ~375 words, 60-90 seconds of speech
  }
}

// Detailed explanations
llmConfig: {
  options: {
    maxTokens: 1000  // ~750 words, 2-3 minutes of speech
  }
}

Voice conversations over 60 seconds per turn feel unnatural. Keep maxTokens around 500-700 for best user experience.

Top P (Nucleus Sampling)

Alternative to temperature for controlling randomness via probability mass. Range: 0.0 to 1.0 Default: 1.0 (disabled) Recommended: 0.9-0.95 when used How Top P Works: Top P limits the model to the most probable tokens whose cumulative probability reaches P.

0.9: Only consider tokens making up top 90% probability mass
- More focused, reduces unlikely words
- Good for consistent, reliable responses
1.0: Consider all tokens
- Full distribution, maximum flexibility
- Standard behavior

// Focused responses using Top P
llmConfig: {
  options: {
    temperature: 1.0,  // Keep standard
    topP: 0.9          // Limit to top 90% probability
  }
}

Temperature vs Top P:

OpenAI recommends using either temperature or topP, not both. If you set both, temperature takes precedence in most implementations.

Approach	Temperature	Top P	Use When
Temperature Control	0.6-0.8	1.0 (default)	Standard voice agents
Top P Control	1.0 (default)	0.9-0.95	Need precise probability control
Conservative	0.5	1.0	Factual, consistent responses
Balanced	0.7	1.0	Natural conversations

Vendor Comparison

Performance Comparison

Vendor	Average Latency	Context Window	Cost (Relative)
Groq	50-100ms	32k	Low
OpenAI (nano)	200-400ms	128k	Low
OpenAI (mini)	300-600ms	128k	Medium
DeepSeek	400-800ms	128k	Very Low
Grok (mini)	400-700ms	128k	Medium
OpenAI (4.1)	600-1200ms	128k	High

For voice agents, latency matters more than for chat. Aim for total response time (LLM + TTS) under 1.5 seconds for natural conversations.

Quality Comparison

Vendor	Reasoning	Creativity	Instruction Following	Best Use Case
OpenAI GPT-4.1	Excellent	Excellent	Excellent	Complex support
Grok-2	Excellent	Very Good	Excellent	Advisory roles
DeepSeek-R1	Very Good	Good	Very Good	Cost-conscious production
Groq Llama-3.3	Good	Good	Very Good	Speed-critical apps
OpenAI o3-mini	Excellent	Good	Very Good	Reasoning tasks

Cost Efficiency

Vendor & Model	Cost Tier	Best Value For
DeepSeek-R1	Lowest	High-volume production
Groq (any)	Low	Speed + cost balance
OpenAI nano	Low-Mid	Simple interactions
OpenAI mini	Mid	Balanced quality/cost
Grok mini	Mid	Advanced reasoning at lower cost
OpenAI 4.1	High	Premium quality required

Choosing the Right LLM

Decision Framework

Start with these questions:

What’s your priority?
- Speed → Groq
- Quality → OpenAI GPT-4.1
- Cost → DeepSeek
- Balance → OpenAI mini or Grok mini
How complex are conversations?
- Simple Q&A → Groq llama-3.1-8b-instant
- General support → OpenAI mini or DeepSeek-R1
- Complex reasoning → OpenAI 4.1 or Grok-2
What’s your call volume?
- High volume → DeepSeek (cost efficiency)
- Medium volume → OpenAI mini
- Low volume → OpenAI 4.1 (premium quality)
Do you need special features?
- Vision/multimodal → OpenAI GPT-4o
- Code discussion → Groq qwen-2.5-coder
- Reasoning → DeepSeek-R1 or OpenAI o3-mini

Common Configurations

Customer Support Agent

llmConfig: {
  vendor: "openai",
  model: "gpt-4.1-mini",
  options: {
    temperature: 0.7,
    maxTokens: 600
  },
  vendorSpecificOptions: {
    service_tier: "priority"  // Optional: for lower latency
  }
}

High-Speed Lead Qualifier

llmConfig: {
  vendor: "groq",
  model: "llama-3.3-70b-versatile",
  options: {
    temperature: 0.6,
    maxTokens: 400
  }
}

Cost-Optimized Production Agent

llmConfig: {
  vendor: "deepseek",
  model: "deepseek-r1",
  options: {
    temperature: 0.7,
    maxTokens: 800
  }
}

Complex Advisory Agent

llmConfig: {
  vendor: "grok",
  model: "grok-2",
  options: {
    temperature: 0.8,
    maxTokens: 1000
  }
}

Testing and Optimization

A/B Testing LLMs

Compare different vendors for your specific use case:

Create identical agents with different LLM configs
Run parallel test calls with same scenarios
Measure:
- Response quality (user satisfaction)
- Response speed (average latency)
- Response length (token usage)
- Conversation success rate
Compare costs over 100-1000 calls

// Agent A: OpenAI
const agentA = { llmConfig: { vendor: "openai", model: "gpt-4.1-mini" } };

// Agent B: DeepSeek
const agentB = { llmConfig: { vendor: "deepseek", model: "deepseek-r1" } };

// Agent C: Groq
const agentC = { llmConfig: { vendor: "groq", model: "llama-3.3-70b-versatile" } };

Parameter Tuning

Temperature Tuning:

Start at 0.7 (balanced)
Test with real conversation scenarios
Adjust based on observations:
- Too robotic/repetitive → Increase to 0.8-0.9
- Too creative/inconsistent → Decrease to 0.5-0.6
- Hallucinating information → Decrease to 0.3-0.5

MaxTokens Tuning:

Monitor average response length in production
If responses frequently truncated → Increase maxTokens
If responses too long → Decrease maxTokens or improve prompt
Optimal: 90% of responses complete, none over 60 seconds spoken

Next Steps

Now that you’ve configured your LLM, continue building your agent:

Introduction

Build

WebSockets

​LLM Configuration

​Supported LLM Vendors

​OpenAI

​Available Models

​GPT-4.1 Series (Latest)

​GPT-4o Series (Multimodal)

​Reasoning Models

​OpenAI-Specific Options

​Service Tier

​Configuration Example

​Groq

​Available Models

​Configuration Example

​Grok (xAI)

​Available Models

​Configuration Example

​DeepSeek

​Available Models

​Configuration Example

​Custom Compatible Provider

​When to Use Custom Providers

​Required Configuration

​Custom Provider Fields

​LLM Parameters

​Temperature

​Max Tokens

​Top P (Nucleus Sampling)

​Vendor Comparison

​Performance Comparison

​Quality Comparison

​Cost Efficiency

​Choosing the Right LLM

​Decision Framework

​Common Configurations

​Testing and Optimization

​A/B Testing LLMs

​Parameter Tuning

​Next Steps

​API Cross-References

LLM Configuration

Supported LLM Vendors

OpenAI

Available Models

GPT-4.1 Series (Latest)

GPT-4o Series (Multimodal)

Reasoning Models

OpenAI-Specific Options

Service Tier

Configuration Example

Groq

Available Models

Configuration Example

Grok (xAI)

Available Models

Configuration Example

DeepSeek

Available Models

Configuration Example

Custom Compatible Provider

When to Use Custom Providers

Required Configuration

Custom Provider Fields

LLM Parameters

Temperature

Max Tokens

Top P (Nucleus Sampling)

Vendor Comparison

Performance Comparison

Quality Comparison

Cost Efficiency

Choosing the Right LLM

Decision Framework

Common Configurations

Testing and Optimization

A/B Testing LLMs

Parameter Tuning

Next Steps

API Cross-References