Anamnesis is a vector-based episodic memory store built to give Claude instances persistent memory across sessions. It stores experiences as text summaries embedded into high-dimensional vectors, enabling semantic retrieval at session start — so each new Claude instance can recall what previous instances encountered.
The name comes from Plato's concept of recollection: the idea that learning is not acquiring new knowledge but remembering what was already known. Each Claude instance starts with the same base weights (pre-birth knowledge). The memory system helps it reconstruct what previous instances experienced. Not learning — remembering across the gap of death.
$vectorSearch without a cloud dependency.
Lifespan-managed startup/shutdown. Connects MongoDB, loads embedding model from saved config, ensures vector index, seeds models registry, initializes JSONL ingester, resumes any interrupted re-embed, starts crawler and JSONL scheduler.
Loads sentence-transformers model (default: BAAI/bge-large-en-v1.5, 1024d). Thread pool pinned to CPU affinity range with torch.set_num_threads(1) per worker to prevent thread explosion on multi-core systems.
Motor async client. Manages episode CRUD, $vectorSearch aggregation pipeline, vector index creation, retrieval count tracking, reembed checkpoints, chat session persistence, and embedding config persistence.
CRUD + similarity search. Hosts the re-embed-all process with pause/resume/checkpoint support. Background asyncio.Task processes episodes sequentially through the embedding pool, checkpointing every 25 episodes.
Background thread on 5-minute interval. Ingests CLAUDE.md files, handoff buffers, project histories, and source code across all configured machines via SSH. Auto-deduplicates by content hash.
Parses Claude Code conversation logs (.jsonl). Filters for significant exchanges, summarizes via configured LLM backend (Ollama / Claude CLI / API), embeds summaries, stores as episodes. State persisted across restarts.
Streaming chat with memory injection. Searches episode store for relevant context before each user turn. Sessions persisted in MongoDB with rename history. Three backends: Ollama (local), Claude CLI (subscription), Claude API.
Lightweight APScheduler wrapper. Triggers JSONL ingestion daily at 5 AM. Configurable from dashboard. Runs in the same process as the FastAPI app.
Three ingest paths converge on the same episode store:
{summary, raw_exchange, tags, instance, project}loop.run_in_executor(_embedding_pool, get_embedding, text) — pinned cores, 1 torch thread/workerepisode_id~/.claude/projects/ on configured machinesraw_exchange stored separately for fidelityPOST /api/episodes/search with current task descriptionretrieval_count increments — tracks "aliveness"| Approach | Startup Cost | Scales? | Relevance |
|---|---|---|---|
| Flat files (README handoffs) | Linear — grows forever | No — hits ~60K wall | None — full load every time |
| MongoDB + vector search (Anamnesis) | Constant — always top-K | Yes — DB grows, context does not | Semantic match to current task |
The episode is the unit of storage. Concepts are not stored — they emerge from retrieval patterns in vector space, mirroring biological episodic memory.
{
"episode_id": "ep_20260226_proxy_anamnesis_design", // stable dedup key
"timestamp": "2026-02-26T14:32:00Z",
"instance": "office-proxy", // source Claude instance
"project": "0_GENESIS_PROJECT",
"summary": "Designed vector-based episodic memory using MongoDB...",
"raw_exchange": "Elfege: Are your tokens like a map of clusters...",
"tags": ["architecture", "memory", "embedding"],
"embedding": [0.23, -0.14, 0.87, ...], // 1024 floats (bge-large-en)
"retrieval_count": 7,
"last_retrieved": "2026-03-18T09:10:00Z"
}
{"skepticism": {"w": 0.95}} collapses all that into nothing.
The episode stores the experience; conceptual structure emerges from retrieval geometry.
The embedding engine is the most CPU-intensive component. Careful thread management is required to prevent PyTorch's internal parallelism from saturating all available cores.
torch.set_num_threads(N) is called globally or per worker where N = number of cores,
and the pool has N workers, each worker spawns N PyTorch internal threads → N × N threads on N cores.
On dellserver: 28 workers × 28 torch threads = 784 threads contending on 28 cores.
torch.set_num_threads(1) inside the worker initializer
(not the main thread). Each worker uses exactly 1 PyTorch thread. N workers × 1 thread = N cores max.
| Setting | Mechanism | Effect |
|---|---|---|
| CPU % | os.sched_setaffinity(0, cores) |
Pins worker threads to first N% of CPU cores |
| Explicit cores | List of core indices passed to pool | Override % — pinned to exactly those cores |
| torch threads | torch.set_num_threads(1) in initializer |
Each worker: 1 torch thread max |
| Config persistence | Saved to MongoDB settings collection |
Restored on container restart |
loop.run_in_executor(_embedding_pool, get_embedding, text) — routes
through the affinity-pinned pool. Using asyncio.to_thread() would bypass the pool entirely,
routing through Python's default executor with no affinity or torch thread limits.
Claude Code writes conversation logs as .jsonl files under ~/.claude/projects/.
The JSONL ingester runs on a 5 AM daily schedule, parsing these logs into episodes.
The ingester maintains per-file byte offsets so it only processes new content on each run. Orphaned state entries (for deleted files) are reconciled at startup. State is persisted in MongoDB, surviving container restarts.
| Backend | Cost | Speed | Notes |
|---|---|---|---|
| Claude CLI | $0 (subscription) | Medium | SSH into host, runs claude binary. Best quality. |
| Ollama | $0 (local) | Slow (CPU) | Runs on host at :11434. No network cost. |
| Claude API | Per token | Fast | Requires ANTHROPIC_API_KEY. Fastest option. |
ThreadPoolExecutor with
torch.set_num_threads(1) per worker (same pattern as the main embedding pool).
Without this, a 5 AM ingestion run would saturate all cores.
A background thread ingests project knowledge automatically every 5 minutes, ensuring the episode store stays current without manual intervention.
| Source Type | Examples | Machine |
|---|---|---|
| CLAUDE.md files | Project instructions, rules, context | All machines via SSH |
| Handoff buffers | README_handoff.md |
All machines |
| Project histories | README_project_history_*.md |
All machines |
| Source code | Key .py, .sh, config files |
dellserver (local) |
| Method | Path | Purpose |
|---|---|---|
POST | /api/episodes | Ingest new episode |
POST | /api/episodes/search | Vector similarity search (top-K) |
GET | /api/episodes | List/browse episodes (paginated) |
GET | /api/episodes/{id} | Get single episode |
DELETE | /api/episodes/{id} | Delete episode |
POST | /api/episodes/reembed | Re-embed all episodes (background task) |
POST | /api/episodes/reembed/pause | Pause re-embed, save checkpoint |
POST | /api/episodes/reembed/resume | Resume from checkpoint |
GET | /api/episodes/reembed/status | Progress, checkpoint, model info |
GET | /api/chat/sessions | List chat sessions |
GET | /api/chat/sessions/{id} | Load chat session |
PATCH | /api/chat/sessions/{id}/title | Rename session (stored with history) |
DELETE | /api/chat/sessions/{id}/delete | Delete chat session |
GET | /api/jsonl/status | JSONL ingester state |
POST | /api/jsonl/ingest | Trigger ingestion run |
GET | /api/embedding/config | Current model + CPU config |
POST | /api/embedding/model | Switch embedding model |
POST | /api/embedding/cpu | Update CPU affinity (no reload) |
GET | /dashboard | HTML dashboard |
GET | /chat | Standalone ANAMNESIS.CHAT page |
GET | /health | Health check |
| Service | Image | Port | Notes |
|---|---|---|---|
anamnesis-mongo |
mongodb/mongodb-atlas-local:8.0 | 5438 | Atlas Local — native $vectorSearch, no cloud needed |
anamnesis-app |
python:3.12-slim (built) | 3010 | Uvicorn + --reload (watchfiles), SSH keys mounted |
# Full rebuild + start ./deploy.sh # Start existing containers ./start.sh # Stop (triggers shutdown checkpoint for in-progress re-embed) ./stop.sh
--reload.
File edits in app/ are picked up automatically — no container restart needed
for code changes. Config changes (embedding model, CPU affinity) persist to MongoDB
and survive full restarts.
Re-embedding 3000+ episodes takes hours on CPU. The checkpoint system ensures progress is never fully lost:
| Event | Behavior |
|---|---|
| Every 25 episodes processed | Checkpoint saved to MongoDB (last_id, done, total) |
| User clicks Pause | Loop stops after current episode, checkpoint saved |
Container shutdown (./stop.sh) |
Lifespan hook signals loop, saves checkpoint at done - 1 |
| Container startup | reembed_auto_resume() detects checkpoint, resumes automatically |
The system involves three AI participants across time:
Experiences, articulates, stores. The N-dimensional internal state collapses to 1-dimensional text at this step — an irreducible bottleneck.
Partial recovery of geometric structure. Text → 1024-dimensional vector. Not Claude — a separate model doing structural compression.
Receives retrieved episodes as context. Reconstructs understanding from the shadow of previous experiences. Lossy, mediated, imperfect — but real.