Intelligence Systems
The 13 intelligence systems that power BrainstormRouter's adaptive model routing.
# Intelligence Systems
BrainstormRouter runs 13 interconnected intelligence systems that continuously optimize routing decisions. These systems learn from every request to improve model selection, reduce costs, and maintain quality.
1. Thompson Sampling
The core routing algorithm. Maintains a Bayesian posterior distribution for each model's performance on each task type. Uses UCB1 (Upper Confidence Bound) during cold-start and transitions to Gaussian Thompson Sampling as data accumulates. This naturally balances exploration of new models with exploitation of known performers.
2. Quality Scoring
Evaluates model output quality using a multi-dimensional scoring function: correctness, completeness, coherence, and instruction following. Scores feed back into Thompson Sampling posteriors. Quality is assessed both through automated heuristics and aggregated user signals.
3. Auto-Selector
The task classifier that analyzes incoming requests and produces a task profile with dimensions like complexity, creativity, precision, and domain. This profile determines which Thompson Sampling posterior to sample from.
4. Predictive Routing
Anticipates the next request in a conversation and pre-warms model connections. Uses conversation trajectory patterns to predict whether the next turn will need a code generation model, an explanation model, or a tool-calling model.
5. Pattern Fingerprinting
Identifies recurring request patterns across users and creates fingerprints. When a new request matches a known fingerprint, the system can skip exploration and route directly to the proven best model. Patterns are anonymized and aggregated.
6. Agent Reputation
Tracks per-agent performance when BrainstormRouter is used within agent frameworks. Agents that consistently produce good results with certain models get those models preferentially. Reputation scores decay over time to adapt to model updates.
7. Degradation Ladder
When a model's performance drops (higher error rates, increased latency, lower quality scores), the degradation ladder progressively reduces its traffic share. The ladder has five rungs, from minor traffic reduction to full circuit break.
8. Cascade
For critical requests, the cascade system sends the request to multiple models simultaneously and returns the best result. Cost-aware cascading only triggers when the quality variance between top candidates is high enough to justify the extra spend.
9. Performance Tracker
Real-time monitoring of model latency, throughput, and error rates. Feeds into the degradation ladder and circuit breaker. Tracks P50, P95, and P99 latencies per model per task type.
10. Budget Forecaster
Projects future costs based on current usage patterns and planned workloads. Alerts when spending is on track to exceed configured limits. Recommends strategy adjustments to stay within budget.
11. Cost-Quality Frontier
Maps the Pareto frontier of cost vs. quality for each task type. Identifies models that are dominated (worse quality AND higher cost than alternatives) and removes them from routing consideration. Updates the frontier as new data arrives.
12. Circuit Breaker
Protects against provider outages. When a provider's error rate exceeds a threshold, the circuit breaker opens and all traffic is rerouted to alternatives. The breaker enters a half-open state after a cooldown period, sending test traffic to check if the provider has recovered.
13. Semantic Cache
Caches responses for semantically similar requests. Uses embedding similarity to identify cache hits even when the exact wording differs. Cache entries have TTL based on the request type -- factual queries cache longer than creative ones.
How They Work Together
The systems form a pipeline: the auto-selector classifies the request, the semantic cache checks for a hit, Thompson sampling selects a model candidate, the performance tracker and circuit breaker filter out unhealthy options, the budget forecaster applies cost constraints, and the cascade triggers if confidence is low. After the response, quality scoring updates the Thompson sampling posteriors, and pattern fingerprinting records the result.
All 13 systems are observable through the /v1/intelligence/* API endpoints and the Brainstorm CLI dashboard.