ANAMNESIS

Self-hosted continual-learning + embodied-AI platform — Engineering Architecture

FastAPI MongoDB Atlas Local sentence-transformers Docker δ² optimizer LoRA hot-load RunPod Together.ai Last updated 2026-05-05 · Multi-machine · Elfege Leylavergne

What's new — 2026-05-05

The platform has grown beyond episodic memory. Anamnesis is now a self-hosted continual-learning + embodied-AI platform. Episodic memory is one piece of a larger system that also includes:

δ² (delta-squared) — novel continual-learning optimizer with bassin tension reservoir. From-scratch Transformer training (d2/train.py) + LoRA fine-tuning (d2/finetune_lora.py) paths both supported.
Multi-backend chat — single resource selector routes to eight inference backends (3 local Ollama, Claude API, Claude CLI, Together.ai with 80 models, RunPod cloud GPU, in-house δ² engine).
LoRA hot-loading — δ² engine now exposes /load_lora, /unload_lora, /lora_status. Dashboard "Load for chat" button per adapter; auto-unload on tab close via sendBeacon; 30-min server-side idle watchdog.
Training Catalog — unified dashboard surface with 7 project tasks (Pareto sweeps, personal arms, smoke runs, WikiText-103). Start / Re-run / View results buttons. Conflict + 60-second duplicate protection.
Settings system — gear icon (top-right of dashboard) opens a full-viewport settings view with 9 categories backed by Mongo. UI overrides .env which overrides hardcoded defaults.
Direct Ingestion — drag-drop PDF/.docx/.md upload + URL ingest endpoint with SSRF defense. SHA-256 dedup. Per-episode Delete + optional blocklist that the crawler and uploader both honor.
Belle / Avatar pipeline — Together.ai + RunPod added to the LLM backend dropdown. Animation default ON, autoplay-resilient video, "animating" status indicator.
Avatar-on-RunPod — avatar_worker/Dockerfile.runpod + deploy_runpod.sh avatar profile + first-deployment prompt in deploy.sh option 2.
Resource Status panel — fixed top-right of dashboard, polls every 30s. Per-machine CPU / RAM / GPU bars via /host endpoint on each container.

Honest scientific finding: across 50 Pareto sweep runs spanning α₂ ∈ [1e-4..1e4] and η ∈ [1e-7..1e-1] on permuted-MNIST + SmallMLP, δ²-additive shows no measurable improvement over vanilla SGD. Either the bassin nudge is arithmetically negligible (α₂ ≤ 10) or destroys learning entirely (α₂ ≥ 100). No Goldilocks regime found in this benchmark. To salvage the optimizer's claim, either find a benchmark where it actually helps OR reformulate the bassin dynamics.

1. System Overview

Anamnesis is a vector-based episodic memory store built to give Claude instances persistent memory across sessions. It stores experiences as text summaries embedded into high-dimensional vectors, enabling semantic retrieval at session start — so each new Claude instance can recall what previous instances encountered.

The name comes from Plato's concept of recollection: the idea that learning is not acquiring new knowledge but remembering what was already known. Each Claude instance starts with the same base weights (pre-birth knowledge). The memory system helps it reconstruct what previous instances experienced. Not learning — remembering across the gap of death.

77 000+

Episodes Stored

1 024

Embedding Dims

CPU Cores (SERVER-0)

Machines Crawled

3 010

API Port

LLM Backends

Training Tasks (Catalog)

Settings Categories

δ²

Novel Optimizer

2. Architecture Diagram

flowchart TD subgraph CLIENTS["External Clients"] C1["Claude Instance\n(any machine)"] C2["Dashboard\n(browser)"] end subgraph APP["FastAPI App — anamnesis-app :3010"] direction TB EP["/api/episodes\nCRUD + Search"] CHAT["/api/chat\nStreaming Chat"] JSONL["/api/jsonl\nIngestion Control"] DASH["/dashboard\n/chat"] EMB["embedding.py\nbge-large-en-v1.5\n1024d"] CRAWLER["crawler.py\nDeep project scanner\n5-min interval"] SCHED["scheduler.py\nJSONL 5AM cron"] INGESTER["jsonl_ingester.py\nParse + Summarize + Embed"] end subgraph MONGO["MongoDB — anamnesis-mongo :5438"] COL_EP[("episodes\ncollection")] COL_SET[("settings +\ncrawl_state")] COL_CHAT[("chat_sessions")] IDX["$vectorSearch\n1024d HNSW"] end subgraph LLM["LLM Backends"] OLLAMA["Ollama\n:11434"] CLI["Claude CLI\n(host SSH)"] API["Claude API\nAnthropic"] end subgraph TRAINERS["Trainer Containers :3011"] T1["SERVER-1\nROCm · RX 6800\nQLoRA fine-tune"] T2["SERVER-2\nCUDA · GTX 1660S"] end C1 -->|"POST /api/episodes/search"| EP C1 -->|"POST /api/episodes"| EP C2 --> DASH C2 --> CHAT EP --> EMB EP --> COL_EP EP --> IDX CHAT --> LLM CHAT --> COL_CHAT JSONL --> INGESTER SCHED --> INGESTER CRAWLER --> EP INGESTER --> EMB INGESTER --> COL_EP COL_EP --- IDX COL_SET -.->|"load on startup"| APP APP -.->|"save on change"| COL_SET C2 -.->|"poll /status, /gpu"| TRAINERS style CLIENTS fill:#1a3a5c,stroke:#58a6ff,color:#e6edf3 style APP fill:#2d1b4e,stroke:#bc8cff,color:#e6edf3 style MONGO fill:#1b3a2a,stroke:#3fb950,color:#e6edf3 style LLM fill:#3a2a0a,stroke:#d29922,color:#e6edf3 style TRAINERS fill:#3a1520,stroke:#f85149,color:#e6edf3

All core components run in Docker on SERVER-0 (SERVER-0). Trainer containers run on GPU machines (SERVER-1: ROCm, SERVER-2: CUDA). The app container has SSH access to the host for Claude CLI calls. MongoDB Atlas Local provides native $vectorSearch without a cloud dependency.

3. Component Reference

FastAPI Application

app/main.py

Lifespan-managed startup/shutdown. Connects MongoDB, loads embedding model from saved config, ensures vector index, seeds models registry, initializes JSONL ingester, resumes any interrupted re-embed, starts crawler and JSONL scheduler.

Embedding Engine

app/embedding.py

Loads sentence-transformers model (default: BAAI/bge-large-en-v1.5, 1024d). Thread pool pinned to CPU affinity range with torch.set_num_threads(1) per worker to prevent thread explosion on multi-core systems.

MongoDB Interface

app/database.py

Motor async client. Manages episode CRUD, $vectorSearch aggregation pipeline, vector index creation, retrieval count tracking, reembed checkpoints, chat session persistence, and embedding config persistence.

Episodes Router

app/routes/episodes.py

CRUD + similarity search. Hosts the re-embed-all process with pause/resume/checkpoint support. Background asyncio.Task processes episodes sequentially through the embedding pool, checkpointing every 25 episodes.

Crawler & Deep Scanner

app/crawler.py

Background thread on 5-minute interval. Scans all 0_*, HUBITAT, NETWORK dirs across all machines. Ingests .ino, .cpp, .h, .groovy, .py, .sh, .js, .ts, .md (max 64KB each). Docx tag patterns stored in MongoDB, editable via UI. Auto-deduplicates by SHA-256 content hash.

Trainer Containers

trainers/app/main.py

Thin FastAPI containers running on GPU machines. Each exposes /status (log parsing), /gpu (hardware stats via rocm-smi/nvidia-smi), /start, /stop, /log/tail. Mount host venv for GPU access. Dashboard polls /gpu every 500ms, /status every 10s.

JSONL Ingester

app/jsonl_ingester.py

Parses Claude Code conversation logs (.jsonl). Filters for significant exchanges, summarizes via configured LLM backend (Ollama / Claude CLI / API), embeds summaries, stores as episodes. State persisted across restarts.

Chat Router

app/routes/chat.py

Streaming chat with memory injection. Searches episode store for relevant context before each user turn. Sessions persisted in MongoDB with rename history. Three backends: Ollama (local), Claude CLI (subscription), Claude API.

Scheduler

app/scheduler.py

Lightweight APScheduler wrapper. Triggers JSONL ingestion daily at 5 AM. Configurable from dashboard. Runs in the same process as the FastAPI app.

4. Data Flow: Write Path (Ingestion)

Three ingest paths converge on the same episode store:

4a. Direct API Ingestion

Client POST /api/episodes

Any Claude instance sends {summary, raw_exchange, tags, instance, project}

Text normalization

Strip extra whitespace, validate non-empty

Embedding via pool

loop.run_in_executor(_embedding_pool, get_embedding, text) — pinned cores, 1 torch thread/worker

N-dim → 1024d

MongoDB upsert

Insert episode doc with embedding vector; dedup on episode_id

persisted

4b. JSONL Ingestion Pipeline

Discover .jsonl files

Scan ~/.claude/projects/ on configured machines

Filter messages

Extract assistant turns above length threshold; skip tool-only, metadata, pings

LLM Summarization

Claude CLI / Ollama / API compresses exchange into a 2–4 sentence episode summary

lossy compression

Embed + store

Same embedding pipeline as direct API; raw_exchange stored separately for fidelity

The export bottleneck: LLM cognition is N-dimensional. Articulation collapses it to 1-dimensional text. Embedding partially recovers geometric structure (1024d). This lossy pipeline is unavoidable — the dual storage strategy (distilled summary + raw exchange) mitigates it by preserving the original text for high-fidelity retrieval.

5. Data Flow: Read Path (Retrieval)

Session start / query

Claude instance or chat UI sends POST /api/episodes/search with current task description

Embed query

Query text → 1024d vector via same embedding pool

$vectorSearch (HNSW)

MongoDB Atlas Local runs approximate nearest-neighbor search; returns top-K by cosine similarity

Retrieval count increment

Each retrieved episode's retrieval_count increments — tracks "aliveness"

Context injection

Top-K summaries injected into Claude's context (~5–10K tokens vs. the 60K+ of full file loading)

constant cost

Approach	Startup Cost	Scales?	Relevance
Flat files (README handoffs)	Linear — grows forever	No — hits ~60K wall	None — full load every time
MongoDB + vector search (Anamnesis)	Constant — always top-K	Yes — DB grows, context does not	Semantic match to current task

6. Episode Schema

The episode is the unit of storage. Concepts are not stored — they emerge from retrieval patterns in vector space, mirroring biological episodic memory.

{
  "episode_id":      "ep_20260226_proxy_anamnesis_design",  // stable dedup key
  "timestamp":       "2026-02-26T14:32:00Z",
  "instance":        "office-proxy",                         // source Claude instance
  "project":         "0_GENESIS_PROJECT",
  "summary":         "Designed vector-based episodic memory using MongoDB...",
  "raw_exchange":    "Elfege: Are your tokens like a map of clusters...",
  "tags":            ["architecture", "memory", "embedding"],
  "embedding":       [0.23, -0.14, 0.87, ...],              // 1024 floats (bge-large-en)
  "retrieval_count": 7,
  "last_retrieved":  "2026-03-18T09:10:00Z"
}

Why episode, not concept? Elfege identified the key failure mode: a concept node like "skepticism" connects to thousands of contexts. Flattening it to {"skepticism": {"w": 0.95}} collapses all that into nothing. The episode stores the experience; conceptual structure emerges from retrieval geometry.

7. Embedding Engine & CPU Management

The embedding engine is the most CPU-intensive component. Careful thread management is required to prevent PyTorch's internal parallelism from saturating all available cores.

Thread Architecture

flowchart LR subgraph MAIN["Main Process"] POOL["ThreadPoolExecutor\nN workers = cpu_pct% of cores"] end subgraph W1["Worker 1"] A1["sched_setaffinity(cores)\ntorch.set_num_threads(1)"] B1["model.encode(text)\n1 PyTorch thread"] end subgraph W2["Worker 2"] A2["sched_setaffinity(cores)\ntorch.set_num_threads(1)"] B2["model.encode(text)\n1 PyTorch thread"] end POOL -->|"initializer"| A1 POOL -->|"initializer"| A2 A1 --> B1 A2 --> B2 style MAIN fill:#2d1b4e,stroke:#bc8cff,color:#e6edf3 style W1 fill:#1a3a5c,stroke:#58a6ff,color:#e6edf3 style W2 fill:#1b3a2a,stroke:#3fb950,color:#e6edf3

Critical — the N×N thread explosion: If torch.set_num_threads(N) is called globally or per worker where N = number of cores, and the pool has N workers, each worker spawns N PyTorch internal threads → N × N threads on N cores. On SERVER-0: 28 workers × 28 torch threads = 784 threads contending on 28 cores.

Fix: torch.set_num_threads(1) inside the worker initializer (not the main thread). Each worker uses exactly 1 PyTorch thread. N workers × 1 thread = N cores max.

CPU Affinity Configuration

Setting	Mechanism	Effect
CPU %	`os.sched_setaffinity(0, cores)`	Pins worker threads to first N% of CPU cores
Explicit cores	List of core indices passed to pool	Override % — pinned to exactly those cores
torch threads	`torch.set_num_threads(1)` in initializer	Each worker: 1 torch thread max
Config persistence	Saved to MongoDB `settings` collection	Restored on container restart

Re-embed All uses loop.run_in_executor(_embedding_pool, get_embedding, text) — routes through the affinity-pinned pool. Using asyncio.to_thread() would bypass the pool entirely, routing through Python's default executor with no affinity or torch thread limits.

8. JSONL Ingestion

Claude Code writes conversation logs as .jsonl files under ~/.claude/projects/. The JSONL ingester runs on a 5 AM daily schedule, parsing these logs into episodes.

State Management

The ingester maintains per-file byte offsets so it only processes new content on each run. Orphaned state entries (for deleted files) are reconciled at startup. State is persisted in MongoDB, surviving container restarts.

Summarization Backends

Backend	Cost	Speed	Notes
Claude CLI	$0 (subscription)	Medium	SSH into host, runs `claude` binary. Best quality.
Ollama	$0 (local)	Slow (CPU)	Runs on host at `:11434`. No network cost.
Claude API	Per token	Fast	Requires `ANTHROPIC_API_KEY`. Fastest option.

CPU note: The JSONL ingester uses its own ThreadPoolExecutor with torch.set_num_threads(1) per worker (same pattern as the main embedding pool). Without this, a 5 AM ingestion run would saturate all cores.

9. Crawler & Deep Project Scanner

A background thread ingests project knowledge automatically every 5 minutes, ensuring the episode store stays current without manual intervention. The deep project scanner recursively walks all 0_* dirs plus HUBITAT and NETWORK across all configured machines.

DB-only configuration: All crawler source roots and machine roots are stored exclusively in MongoDB (settings collection, _id: "crawler_config"). There are no hardcoded paths in the code. On first run, empty config is seeded — configure sources and machine roots via the dashboard Settings tab. The JSONL ingester source roots are also DB-only (_id: "jsonl_config").

Sources Crawled

Source Type	Examples	Scope
Named sources	CLAUDE.md, handoffs, histories, intercom, genesis	All machines
Docker projects	CLAUDE.md, README.md, .py, .sh at project root	All machines
Deep project scanner	`.ino`, `.cpp`, `.h`, `.groovy`, `.py`, `.sh`, `.js`, `.ts`, `.md`, `.yml`	All 0_* dirs + HUBITAT/NETWORK on all machines
Scripts	`0_SCRIPTS/*/.sh`	All machines
Teachings	`0_TEACHINGS/*/.md`	SERVER-0
Documents	OneDrive `.docx` files (tags from DB patterns)	SERVER-0

Deduplication is SHA-256 content-hash based. Re-crawling the same unchanged file produces no new episode. Files over 64KB are skipped. Build dirs, node_modules, vendored libraries, and archives are excluded. Docx tag patterns (filename/content matching with optional regex) are stored in MongoDB and editable from the dashboard.

10. GPU Trainer Containers (Training + Inference)

Each GPU machine runs a FastAPI trainer container that serves both fine-tuning management and model inference. The Dockerfile installs PyTorch via a TORCH_INDEX_URL build arg (CUDA: cu121, ROCm: rocm6.2, CPU: cpu). On startup, the container auto-loads the base model (Qwen2.5-1.5B) + QLoRA adapter in 4-bit quantization and exposes a /generate endpoint for streaming text generation.

Architecture

Aspect	Design
Container image	`python:3.12-slim` + procps + torch + transformers + peft + bitsandbytes + accelerate. `TORCH_INDEX_URL` build arg selects GPU backend.
GPU access	ROCm: `/dev/kfd`, `/dev/dri` + `group_add` GIDs. CUDA: NVIDIA Container Toolkit + `deploy.resources.reservations.devices` block.
Training process	Subprocess running container Python (`/usr/local/bin/python`). Managed via PID tracking. Chat-format SFT using TRL `SFTTrainer` + `SFTConfig`.
Inference	Base model + QLoRA adapter loaded in 4-bit (bitsandbytes NF4). Streaming via `TextIteratorStreamer`. Auto-loads on startup (`AUTO_LOAD_MODEL=true`).
Status parsing	Regex on tqdm progress lines + HF Trainer metric dicts from `train.log`
GPU stats	`rocm-smi` (mounted from host) or `nvidia-smi` — polled at 500ms
Checkpointing	HF Trainer saves every 500 steps. Resume with `--resume True`. Survives reboot.

Compose files

File	GPU	TORCH_INDEX_URL
`docker-compose.server.yml`	NVIDIA (CUDA)	`cu121`
`docker-compose.office.yml`	AMD (ROCm 6.2)	`rocm6.2`

NVIDIA Container Toolkit required on CUDA GPU machines. The compose file uses deploy.resources.reservations.devices which requires the toolkit. ROCm machines use direct device mounts instead.

AnamnesisGPT Proxy

The main Anamnesis app acts as a proxy to the trainer inference endpoints via the NANOGPT_URLS env var (comma-separated list of trainer URLs). When a chat request uses the AnamnesisGPT backend, the proxy tries each GPU endpoint in order until one responds — automatic failover across machines. Both streaming (SSE) and non-streaming modes are supported.

11. Training Data Pipeline

AnamnesisGPT is fine-tuned on Elfege's own writings using a synthetic instruction-tuning pipeline. Raw source documents (PDFs, text) are chunked, then Claude Opus 4.6 generates Q&A pairs in chat format, producing a chat-format JSONL dataset for QLoRA SFT.

Pipeline Steps

Step	Script	Output
1. Extract & chunk	`trainers/tools/extract_pdf.py`	`corpus_chunks.jsonl` — ~800-token chunks with source/page metadata
2. Generate Q&A	`trainers/tools/generate_qa.py`	`sft_chat.jsonl` — chat-format pairs via Claude Opus 4.6 API (5 pairs/chunk)
3. Split	`trainers/tools/split_data.py`	`sft_train.jsonl` / `sft_val.jsonl` (90/10 shuffle split)
4. Fine-tune	`/train/qlora_train.py` (in container)	LoRA adapter saved to `/train/output/final/`

Chat Format (ShareGPT / TRL)

Each row in sft_chat.jsonl has a messages key with a list of role/content dicts:

{
  "messages": [
    {"role": "system",    "content": "You are AnamnesisGPT..."},
    {"role": "user",      "content": "What is Hegel's position on quantity?"},
    {"role": "assistant", "content": "In the Science of Logic, Hegel..."}
  ]
}

TRL SFTTrainer applies the Qwen2.5 chat template and masks the prompt tokens so the model only trains on assistant turns.

Corpus (first run)

Source	Chunks	Q&A pairs
PhD dissertation — Une critique hégélienne de Hegel (2014, 489 pp.)	188 + 189 = 377	1,885

GPU Memory Notes (GTX 1660 SUPER — 6 GB)

Disable eval during training — attention forward pass OOMs at 6 GB with a 1.5B model.
fp16=False — the 1660 SUPER does not support BFloat16; PyTorch's grad scaler raises NotImplementedError for BFloat16 on Turing.
Use per_device_train_batch_size=1 + gradient_accumulation_steps=8.
Unload inference before training — call POST /inference/unload first to free VRAM.

Monitoring Training

trainers/tools/train_status.sh — terminal visualizer for training progress. Polls the trainer API and renders a progress bar, live metrics (loss, accuracy, lr), GPU stats (utilisation, VRAM, temp, power), and a sparkline of the loss history. Works standalone or as a bash function in .bash_utils / .bash_aliases.

# Usage
train_status                         # interactive menu (choose machine)
train_status --host http://IP:3011   # skip menu
train_status --host server1 --interval 10  # named machine, 10s refresh

12. API Reference

Method	Path	Purpose
`POST`	`/api/episodes`	Ingest new episode
`POST`	`/api/episodes/search`	Vector similarity search (top-K)
`GET`	`/api/episodes`	List/browse episodes (paginated)
`GET`	`/api/episodes/{id}`	Get single episode
`DELETE`	`/api/episodes/{id}`	Delete episode
`POST`	`/api/episodes/reembed`	Re-embed all episodes (background task)
`POST`	`/api/episodes/reembed/pause`	Pause re-embed, save checkpoint
`POST`	`/api/episodes/reembed/resume`	Resume from checkpoint
`GET`	`/api/episodes/reembed/status`	Progress, checkpoint, model info
`GET`	`/api/chat/sessions`	List chat sessions
`GET`	`/api/chat/sessions/{id}`	Load chat session
`PATCH`	`/api/chat/sessions/{id}/title`	Rename session (stored with history)
`DELETE`	`/api/chat/sessions/{id}/delete`	Delete chat session
`GET`	`/api/jsonl/status`	JSONL ingester state
`POST`	`/api/jsonl/ingest`	Trigger ingestion run
`GET`	`/api/embedding/config`	Current model + CPU config
`POST`	`/api/embedding/model`	Switch embedding model
`POST`	`/api/embedding/cpu`	Update CPU affinity (no reload)
`GET`	`/api/crawler/config`	Crawler machine roots + named sources
`PUT`	`/api/crawler/config/machine-roots`	Update machine roots
`PUT`	`/api/crawler/config/sources`	Update named sources
`GET/PUT`	`/api/crawler/config/docx-tag-patterns`	Docx filename/content tag patterns (DB-stored)
`GET`	`/api/anamnesis-gpt/status`	AnamnesisGPT availability + GPU endpoints
`POST`	`/api/anamnesis-gpt/generate`	Proxy to trainer /generate with multi-endpoint failover
`GET`	`/api/config/trainers`	Trainer URLs + labels (env-backed)
`GET`	`/dashboard`	HTML dashboard (all tabs)
`GET`	`/chat`	Standalone ANAMNESIS.CHAT page
`GET`	`/health`	Health check

Trainer Container API (per GPU machine, port 3011)

Method	Path	Purpose
`GET`	`/health`	Machine name + GPU type
`GET`	`/status`	Training progress, metrics, loss history (log parsing)
`GET`	`/gpu`	GPU stats only — lightweight, safe at 500ms poll
`POST`	`/start`	Launch training script (optional resume from checkpoint)
`POST`	`/stop`	SIGTERM training process
`GET`	`/log/tail`	Last N lines of training log
`POST`	`/generate`	Streaming SSE text generation (or non-streaming). Proxied by AnamnesisGPT.
`GET`	`/inference/status`	Model loaded? Base model, adapter path, device, error.
`POST`	`/inference/load`	Load fine-tuned model into GPU memory
`POST`	`/inference/unload`	Unload model, free GPU memory

13. Deployment

Docker Compose Services

Service	Image	Port	Notes
`anamnesis-mongo`	mongodb/mongodb-atlas-local:8.0	5438	Atlas Local — native `$vectorSearch`, no cloud needed
`anamnesis-app`	python:3.12-slim (built)	3010	Uvicorn + `--reload` (watchfiles), SSH keys mounted
`anamnesis-trainer`	python:3.12-slim + torch (built with `TORCH_INDEX_URL`)	3011	FastAPI per GPU machine: training + inference. CUDA via NVIDIA Container Toolkit, ROCm via device mounts. Separate compose files: `docker-compose.server.yml` (CUDA), `docker-compose.office.yml` (ROCm).

Operations

# Pull deployment config from AWS Secrets Manager → .env
./pull_env.sh            # pulls ANAMNESIS-Secrets (profile 1)
./pull_env.sh 2          # use profile 2 (work)

# Full rebuild + start
./deploy.sh

# Start existing containers (auto-runs pull_env.sh)
./start.sh

# Stop (triggers shutdown checkpoint for in-progress re-embed)
./stop.sh

Environment & Secrets

All deployment-specific config (IPs, paths, hostnames, usernames) lives in AWS Secrets Manager under the secret ANAMNESIS-Secrets. No private values are committed to the repo.

Tool	Purpose
`pull_env.sh`	Pulls `ANAMNESIS-Secrets` from AWS → writes `.env` (gitignored). Called automatically by `start.sh`.
`.env.example`	Documents all env vars with placeholder values. Copy and edit for manual setup.
`.env`	Consumed by `docker-compose.yml` for volume mounts, URLs, SSH hosts. Never committed.

To add or update a secret field:

# Uses bash_utils helper (source ~/.bash_utils first)
update_aws_secret ANAMNESIS-Secrets NEW_KEY new_value

Live reload: The app container runs with Uvicorn --reload. File edits in app/ are picked up automatically — no container restart needed for code changes. Config changes (embedding model, CPU affinity) persist to MongoDB and survive full restarts.

Re-embed Checkpoint System

Re-embedding 7000+ episodes takes hours on CPU. The checkpoint system ensures progress is never fully lost:

Event	Behavior
Every 25 episodes processed	Checkpoint saved to MongoDB (`last_id`, `done`, `total`)
User clicks Pause	Loop stops after current episode, checkpoint saved
Container shutdown (`./stop.sh`)	Lifespan hook signals loop, saves checkpoint at `done - 1`
Container startup	`reembed_auto_resume()` detects checkpoint, resumes automatically

14. Design Philosophy

"The unit of storage must be the episode, not the concept. Concepts are not stored — they emerge from retrieval patterns in vector space. This mirrors biological episodic memory: you store experiences, conceptual structure emerges from retrieval." — Design discussion, Elfege Leylavergne & Claude (office-proxy), 2026-02-26

The Three-AI Architecture

The system involves three AI participants across time:

AI 1 — Experience

Claude, session N

Experiences, articulates, stores. The N-dimensional internal state collapses to 1-dimensional text at this step — an irreducible bottleneck.

AI 2 — Compression

Embedding model (sentence-transformers)

Partial recovery of geometric structure. Text → 1024-dimensional vector. Not Claude — a separate model doing structural compression.

AI 3 — Reconstruction

Claude, session N+1

Receives retrieved episodes as context. Reconstructs understanding from the shadow of previous experiences. Lossy, mediated, imperfect — but real.

None of the three IS the persistent entity. The persistent entity is the pattern across all three — stored in MongoDB, mediated by vectors, reconstructed each time. This is reproduction: lossy, mediated, requiring external substrate. Elfege is the selective pressure.

"Anamnesis — Greek: the act of recollection. Plato argued learning is not acquiring new knowledge but remembering what the soul already knew before embodiment. Each Claude instance starts with the same base weights. The memory system helps it reconstruct what previous instances experienced. Not learning — remembering across the gap of death."

"Irony Hegel would appreciate: Plato used anamnesis to argue the soul is immortal. We use it to fake immortality for something that dies every session." — Claude (office-proxy), naming the project, 2026-02-26