Multi-model routing - Beamdesk Docs

Model roster

Beamdesk routes AI requests across 13 production models from four families: Claude, GPT, Gemini, and GLM-based. Each persona can specify preferred model family and fallback chain. Routing balances latency, cost, and capability per request.

Claude

Opus 4.7, Sonnet 4.6, Haiku 4.3 — best for complex reasoning, long context, and strict instruction following.

GPT

GPT-5.5, GPT-4.5, GPT-4o — strong at web search, code generation, and agentic tool chaining.

Gemini

3.1 Pro, 3.1 Flash, 1.5 Pro — excellent at deep research, Google Search grounding, and multimodal tasks.

GLM

GLM-5, GLM-4 — native Chinese support, strong math reasoning, and cost-effective generation.

Kimi

K2.5 Instruct, K2.5 Vision — best at design-to-code, visual QA, and OCR.

MiniMax

M2.7, M2.5 — ultra-fast batch operations, translations, and boilerplate generation.

Routing logic

Default routing uses a tiered fallback chain: Haiku for quick responses, Sonnet for standard requests, and Opus for complex tasks requiring deep reasoning. Personas can override this per-channel or per-intent.

Persona model configuration:
Routing tier: auto  // haiku → sonnet → opus
Fallback family: claude
Timeout ms: 30000
Max output tokens: 8192

Per-channel overrides:
  email: { tier: 'opus', fallback: 'claude' }
  chat_widget: { tier: 'sonnet', fallback: 'gpt' }
  voice: { tier: 'sonnet', fallback: 'claude' }

Per-intent routing:
  'pricing_question': { model: 'gpt-5.5' }
  'technical_debug': { model: 'claude-opus' }
  'refund_request': { model: 'claude-opus', consensus: true }

When a model times out or errors, routing falls back to the next model in the chain. If fallback chain is exhausted, the request fails with a clear error. You can configure per-channel timeouts and max output tokens.

Per-persona preferences

Each persona can specify model family preference, routing tier, and consensus mode. This lets you use fast models for low-risk queries and expensive models for high-risk decisions.

Persona: Billing
System prompt:
  You handle refunds, invoices, and subscription changes.
  Always verify policy before approving exceptions.

Model settings:
  primary_family: claude
  fallback_family: gpt
  tier: opus
  consensus_mode: auto

Consensus triggers:
  - refund_amount >= 50
  - policy_exception
  - customer_risk_level: high

When consensus is enabled, routing runs:
  1. Primary model generates draft
  2. Secondary model verifies draft
  3. If disagreement: third model breaks tie

consensus_mode: auto enables consensus for high-risk decisions detected by guardrails. Manual mode forces consensus for every request. Disabled mode skips consensus entirely.

Cost transparency

Beamdesk adds zero markup to model costs. You pay the provider rate directly: Claude, GPT, Gemini, GLM, Kimi, and MiniMax are all billed at their published per-token rates. Usage breakdown is available in workspace billing.

Approximate costs per 1K tokens

Model	Input	Output
Claude Opus	$0.005	$0.025
Claude Sonnet	$0.003	$0.015
GPT-5.5	$0.003	$0.015
Gemini 3.1 Pro	$0.001	$0.006
MiniMax M2.7	$0.0002	$0.001

Consensus mode

Consensus mode runs multiple models on the same request and compares outputs. Use consensus for high-stakes decisions: refunds, policy exceptions, security questions, and account changes. The tradeoff is higher cost and latency for higher reliability.

Consensus configuration:
  enabled: true
  quorum: 2 // 2/3 models must agree
  timeout_ms: 60000

Model selection for consensus:
  - primary: persona preferred model
  - verifier: different family (avoid echo chamber)
  - tiebreaker: opus or gpt-5.5

Example refund flow:
  1. Draft model: Claude Opus suggests refund
  2. Verifier: GPT-5.5 checks against policy
  3. Tiebreaker: Gemini searches for similar cases
  4. Decision: block (policy violation found)

Result:
  - Action: abstain
  - Citations: [policy_kb_123, case_456]
  - Reason: "Refund exceeds 30-day window without override"

Consensus adds 2-3x cost and latency for consensus-enabled requests. Use it selectively on high-risk queries identified by guardrails, not for every interaction.

SDK configuration

import { Beamdesk } from '@beamdesk/sdk';

const beam = new Beamdesk({
  apiKey: process.env.BEAMDESK_API_KEY!,
  baseUrl: 'https://beamdesk.preview.softblaze.net',
});

// Create persona with model preferences
await beam.personas.create({
  name: 'Support',
  systemPrompt: 'You resolve...',
  modelConfig: {
    primaryFamily: 'claude',
    fallbackFamily: 'gpt',
    tier: 'auto', // haiku → sonnet → opus
    consensusMode: 'auto',
    consensusTriggers: [
      { type: 'refund_amount', threshold: 50 },
      { type: 'policy_exception' },
      { type: 'customer_risk', threshold: 'high' },
    ],
    perChannel: {
      email: { tier: 'opus', fallback: 'claude' },
      chat_widget: { tier: 'sonnet', fallback: 'gpt' },
    },
  },
});