Skip to content

Memory

4-block Retrieval-Managed Memory with pgvector backend for persistent agent context.

# Memory

BrainstormRouter provides a server-side memory system for AI agents that need persistent context across sessions. The 4-block Retrieval-Managed Memory (RMM) architecture stores, indexes, and retrieves relevant context automatically.

4-Block Architecture

Core Facts

High-priority, slowly-changing facts about the user, project, or organization. Core facts are injected into every request's context window.

Examples:

  • Organization name and industry
  • Tech stack and architecture decisions
  • Compliance requirements and security policies

Core facts have the highest retrieval priority and are always included when context space permits.

Archival Memory

Long-term storage for historical context, past decisions, and reference material. Archival memories are retrieved only when semantically relevant to the current request.

The archival block uses pgvector for semantic similarity search. When a request comes in, the system embeds the query and retrieves the top-k most relevant archival memories within a configurable similarity threshold.

Sleep-Time Extraction

Background processing that runs between sessions to extract and consolidate memories. When an agent session ends, the sleep-time extraction pipeline:

1. Reviews the full conversation transcript

2. Identifies new facts, decisions, and lessons

3. Deduplicates against existing memories

4. Assigns confidence scores and memory types

5. Stores extracted memories in the appropriate block

This happens asynchronously and does not affect request latency.

Semantic Cache

Short-lived cache of recent context and responses. The semantic cache serves two purposes:

1. Response caching: Semantically similar queries can return cached responses, reducing cost and latency

2. Context continuity: Recent conversation context is available even if the client does not send full history

Cache entries have configurable TTL based on content type. Factual responses cache longer than creative ones.

pgvector Backend

All memory blocks are stored in PostgreSQL with the pgvector extension for efficient similarity search:

  • Embedding model: Configurable, defaults to a high-quality embedding model via the router
  • Index type: HNSW for fast approximate nearest neighbor search
  • Dimensions: 1536 (compatible with most embedding models)
  • Distance metric: Cosine similarity

API Usage

Store a memory

``bash

curl -X POST https://api.brainstormrouter.com/v1/memory \

-H "Authorization: Bearer br-your-api-key" \

-d '{

"content": "The production database is PostgreSQL 16 on DigitalOcean",

"type": "core",

"metadata": {"project": "my-app"}

}'

`

Search memories

`bash

curl "https://api.brainstormrouter.com/v1/memory/search?q=database+configuration&limit=5" \

-H "Authorization: Bearer br-your-api-key"

`

Automatic injection

When memory is enabled, relevant memories are automatically injected into the system prompt for each request. Control this with headers:

`

X-BR-Memory: true

X-BR-Memory-Blocks: core,archival

X-BR-Memory-Limit: 2000 # max tokens for memory context

``

Isolation

Memory is fully isolated per API key. Organization-level sharing is available on Team and Enterprise plans, with fine-grained access controls for which agents can read or write to shared memory blocks.