Memory
4-block Retrieval-Managed Memory with pgvector backend for persistent agent context.
# Memory
BrainstormRouter provides a server-side memory system for AI agents that need persistent context across sessions. The 4-block Retrieval-Managed Memory (RMM) architecture stores, indexes, and retrieves relevant context automatically.
4-Block Architecture
Core Facts
High-priority, slowly-changing facts about the user, project, or organization. Core facts are injected into every request's context window.
Examples:
- Organization name and industry
- Tech stack and architecture decisions
- Compliance requirements and security policies
Core facts have the highest retrieval priority and are always included when context space permits.
Archival Memory
Long-term storage for historical context, past decisions, and reference material. Archival memories are retrieved only when semantically relevant to the current request.
The archival block uses pgvector for semantic similarity search. When a request comes in, the system embeds the query and retrieves the top-k most relevant archival memories within a configurable similarity threshold.
Sleep-Time Extraction
Background processing that runs between sessions to extract and consolidate memories. When an agent session ends, the sleep-time extraction pipeline:
1. Reviews the full conversation transcript
2. Identifies new facts, decisions, and lessons
3. Deduplicates against existing memories
4. Assigns confidence scores and memory types
5. Stores extracted memories in the appropriate block
This happens asynchronously and does not affect request latency.
Semantic Cache
Short-lived cache of recent context and responses. The semantic cache serves two purposes:
1. Response caching: Semantically similar queries can return cached responses, reducing cost and latency
2. Context continuity: Recent conversation context is available even if the client does not send full history
Cache entries have configurable TTL based on content type. Factual responses cache longer than creative ones.
pgvector Backend
All memory blocks are stored in PostgreSQL with the pgvector extension for efficient similarity search:
- Embedding model: Configurable, defaults to a high-quality embedding model via the router
- Index type: HNSW for fast approximate nearest neighbor search
- Dimensions: 1536 (compatible with most embedding models)
- Distance metric: Cosine similarity
API Usage
Store a memory
``bash
curl -X POST https://api.brainstormrouter.com/v1/memory \
-H "Authorization: Bearer br-your-api-key" \
-d '{
"content": "The production database is PostgreSQL 16 on DigitalOcean",
"type": "core",
"metadata": {"project": "my-app"}
}'
`
Search memories
`bash
curl "https://api.brainstormrouter.com/v1/memory/search?q=database+configuration&limit=5" \
-H "Authorization: Bearer br-your-api-key"
`
Automatic injection
When memory is enabled, relevant memories are automatically injected into the system prompt for each request. Control this with headers:
`
X-BR-Memory: true
X-BR-Memory-Blocks: core,archival
X-BR-Memory-Limit: 2000 # max tokens for memory context
``
Isolation
Memory is fully isolated per API key. Organization-level sharing is available on Team and Enterprise plans, with fine-grained access controls for which agents can read or write to shared memory blocks.