The first model trained to
orchestrate, not just code
Every AI tool asks “What code should I write?” BrainstormLLM asks “How should this feature be built?” — and routes each phase to the right model at the right cost.
Trained on real sessions from Brainstorm CLI. Deployed inside BrainstormRouter. Powers every platform in the portfolio.
0.796
Mean F1 Score
2,203
Training Trajectories
<2ms
ONNX Inference
68%
Cost Reduction
0.587
Baseline (Plan Cache)
Not every task needs every phase
A simple bug fix doesn't need architecture or documentation. A new feature needs all 9 phases. A refactor skips specification but needs verification. Every AI coding tool today treats all tasks the same — send the full prompt to one model and hope for the best.
BrainstormLLM predicts which phases a task requires, in what order, and which model should handle each one. The result: 68% cost reduction compared to running every task through the full pipeline with a quality model.
The prediction runs in under 2 milliseconds (ONNX) inside BrainstormRouter's hot path. By the time the first model receives the prompt, the optimal pipeline is already planned.
Predicted phase plans
| Task | spe | arc | imp | rev | ver | ref | dep | doc | rep | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| Fix the null pointer in auth.ts | $0.003 | |||||||||
| Add JWT middleware to all API routes | $0.008 | |||||||||
| Design the notification system | $0.02 | |||||||||
| Refactor the database layer to use connection pooling | $0.015 | |||||||||
| Build the complete checkout flow with Stripe | $0.05 |
Amber = phase included. Gray = skipped. Cost = estimated per task.
Training Data
2,203 real development trajectories from three sources — not synthetic data. Every trajectory records which phases were executed, their order, the models used, cost, and whether the outcome succeeded.
Orchestration Pipelines
9-phase runs captured from Brainstorm CLI production sessions
Claude Code Sessions
90K messages from real development work across 6 projects
RouterBench Dataset
Routing decision datapoints from HuggingFace for cold-start calibration
Architecture
Not a transformer. Not a fine-tuned LLM. 9 per-phase gradient boosted machines(GBMs), each conditioned on prior phase outcomes. Phase 3's model sees whether phases 1 and 2 were included. This sequential dependency is what makes the predictions accurate.
Baseline
Plan cache — keyword template matching
0.587 F1
Sequential GBMs
9 models, each conditioned on prior phases (+36% over baseline)
0.796 F1
Inference
ONNX export, runs in BrainstormRouter hot path
<2ms
Kill Gates
Pass/fail criteria stop training early on failed experiments
Per-phase
Every session trains the next generation
This is not a static model. Every task run through Brainstorm CLI generates a trajectory. Every trajectory flows to BrainstormRouter. The Router aggregates outcomes and periodically retrains the predictor.
The production platforms — BrainstormMSP (37 agents), Brainstorm-GTM (70 agents), Peer10, OurBookNook, FinishStrong, BrainstormEvent — all generate real-world trajectories that validate and improve predictions.
More users → more trajectories → better predictions → lower costs → more users. The flywheel compounds.
Developer uses CLI
Prompt classified, routed, executed. Full session trajectory captured.
Trajectory → Router
Which phases ran, which models, what cost, did it succeed?
Router aggregates
Per-task-type × model performance tracked. Thompson sampling updates.
LLM retrains
GBMs retrained on accumulated trajectories. New ONNX exported.
Predictions improve
Next task predicted faster, cheaper, more accurately. <2ms overhead.
Cycle repeats
Better predictions → better routing → better outcomes → better training data.
What's next for BrainstormLLM
v2 (Current)
- ✓Sequential GBMs (9 per-phase)
- ✓0.796 F1 on phase prediction
- ✓ONNX export, <2ms inference
- ✓Deployed in BrainstormRouter
v3 (Training)
- ✓Transformer-based sequence model
- ✓Cross-project transfer learning
- ✓Model-specific phase routing
- ✓Cost-quality Pareto optimization
v4 (Research)
- ✓Natural language → full pipeline plan
- ✓Multi-agent orchestration prediction
- ✓Real-time trajectory streaming
- ✓Community model (federated learning)
Every task you run makes the system smarter
Install the CLI. Run your first task. Your trajectory joins the dataset that trains the next generation of BrainstormLLM — and improves routing for everyone.