Backup Manager v1.0

Architecture Reference — dellserver — 2026-03-06 — Port 8085 (host) → 8000 (container)

Executive Summary

Backup Manager is a FastAPI service running in Docker on dellserver. It replaces cron-driven bash archival scripts with a Python-native scheduler, REST API, and web dashboard. It pulls home directories from all Linux hosts via SSH/rsync and produces three backup layers per host: rolling hourly hardlink snapshots (Time Machine-style), compressed tar archives, and monthly golden snapshots.

The scheduler fires every hour. All hosts are processed in parallel via ThreadPoolExecutor. Because rsync and gzip release the Python GIL, the real work runs as concurrent OS processes on separate cores.

Key design invariant: each host writes exclusively to its own subtree under BACKUP_ROOT/{host}/. No shared mutable state between host threads.

Deployment Topology

┌─── LAN (192.168.10.x) ──────────────────────────────────────────────────┐ │ │ │ ┌──────────────┐ ┌───────────────┐ ┌──────────────┐ │ │ │ officewsl │ │ server │ │ laptopwsl │ │ │ │ ~96 GB │ │ ~22 GB │ │ (empty) │ │ │ └──────┬───────┘ └──────┬────────┘ └──────┬───────┘ │ │ │ │ │ │ │ └──────────── SSH + rsync ───────────────┘ │ │ │ │ │ ┌──────────────▼──────────────────────────────┐ │ │ │ dellserver (also a backup source ~43 GB) │ │ │ │ │ │ │ │ ┌──────────────────────────────────────┐ │ │ │ │ │ Docker: backup-manager │ │ │ │ │ │ Port 8085 → 8000 · restart: always │ │ │ │ │ │ FastAPI + APScheduler │ │ │ │ │ └──────────────┬───────────────────────┘ │ │ │ │ │ │ │ │ │ ┌──────────────▼───────────────────────┐ │ │ │ │ │ /mnt/THE_BIG_DRIVE (21 TB, ~18 TB free)│ │ │ │ │ │ ________MAIN_LINUX_BACKUP/ │ │ │ │ │ └──────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────┘ │ │ │ │ Browser ──► http://dellserver:8085 │ └──────────────────────────────────────────────────────────────────────────┘
dellserver is both the host running the manager and a backup source. Inside the container it reaches itself via SSH to localhost, using the same rsync pipeline as remote hosts.

Docker Container

Single service defined in docker-compose.yml. restart: always ensures survival across host reboots.

Volume Mounts

Host PathContainer PathModePurpose
/mnt/THE_BIG_DRIVE/mnt/THE_BIG_DRIVErwBackup destination — 21 TB drive
~/.ssh/host-sshroSSH keys for rsync to remote hosts
/etc/ssh/ssh_config/etc/ssh/ssh_configroHost SSH config (aliases, ControlMaster)

Port Mapping

HostContainerService
80858000FastAPI / Uvicorn

All configuration injected via .env file. TZ=America/New_York set inline for consistent timestamp formatting.

FastAPI Application

Entry point: app/main.py. Wires three routers and manages the scheduler lifecycle through FastAPI's lifespan async context manager.

FastAPI(lifespan) ├── startup → start_scheduler() ├── shutdown → stop_scheduler() ├── router: dashboard_router app/routes/dashboard.py → GET / GET /architecture ├── router: api_router app/routes/api.py → /api/* └── router: browse_router app/routes/browse.py → /browse/*
RouterFilePrefixResponsibility
dashboard_routerapp/routes/dashboard.pynoneHTML dashboard + disk/host info
api_routerapp/routes/api.py/apiJSON REST — trigger, status, cancel, schedule
browse_routerapp/routes/browse.py/browseFile browser for backup directory tree

Scheduler

APScheduler BackgroundScheduler with IntervalTrigger. One job (run_backup_job) fires every SNAPSHOT_INTERVAL_HOURS (default: 1 hour).

BackgroundScheduler executors: threadpool, max_workers=3 job_defaults: coalesce=True, max_instances=1 └── job: backup_job trigger: IntervalTrigger(hours=1) → run_backup_job()

Deduplication Guard

A threading.Lock (_job_lock) prevents concurrent runs. If the scheduler fires while a previous job is still in progress (e.g., slow network), the duplicate is dropped with a warning. Same guard applies to manual API triggers.

if not _job_lock.acquire(blocking=False):
    logger.warning("Backup job already running — ignoring duplicate trigger")
    return False

Backup Pipeline

The pipeline is split into two independent scheduler jobs that run concurrently. Both spawn one thread per host via ThreadPoolExecutor.

Job 1 — Hourly Snapshots (run_snapshot_job(), every 1 h)

The primary data-pull mechanism. Rsyncs directly from the remote host into a new snapshot directory, using --link-dest to hardlink unchanged files against the previous snapshot. Only deltas transfer over the wire.

Step 1
rsync + snapshot
Pull remote ~/
→ hourly/HH00/
(--link-dest=prev)
Step 2
cleanup
Tiered hourly retention

After a successful snapshot, current/ is updated to be a symlink pointing to the latest complete snapshot. This means current/ is always a consistent, read-only view of the most recent data — no in-place modifications.

rsync -aHu --link-dest={prev_snapshot}/ {host}:~/ {snapshot_dir}/
# Then: current → symlink → hourly/latest-snapshot/

A .snapshot_complete marker is written on success (crash safety). Partial snapshots from crashes are cleaned up automatically. The latest and current symlinks are updated atomically via tmp-rename. Idempotent: if this hour's snapshot already exists with the marker, it is skipped.

Job 2 — Heavy Pipeline (run_heavy_job(), every 4 h)

Reads from current/ (symlink to latest snapshot). No rsync — data pull is handled entirely by Job 1.

Step 1
tar + monthly
Run in parallel
both read current/
Step 2
cleanup
Archive + monthly
retention

Protected (never cancelled)   Cancellable per-host

Tar archive

Compresses current/ to a .tar.gz (gzip level 6). Skipped if the most recent archive is less than TAR_INTERVAL_HOURS (4 h) old.

tar -czf backup_{host}_YYYY-MM-DD_HH00.tar.gz -C {current_dir} .

Monthly snapshot

On the 1st of each month (or immediately if no snapshots exist yet for this host), creates a hardlinked golden copy at monthly/YYYY-MM/.

Cleanup

Removes expired tar archives (older than TAR_RETENTION_DAYS), removes expired monthly snapshots (older than RETENTION_MONTHS).

Cancellation

Each host has an independent cancel flag (_cancel_flags[host]). Stages 2-5 check the flag before starting. Mid-stage subprocesses are terminated via proc.terminate() with a 2-second polling loop. Partial output (incomplete tar file, incomplete snapshot dir) is cleaned up immediately.

Cancel during rsync: The flag is set, but rsync completes normally. Cancellation takes effect at the next stage boundary.

Storage — Directory Layout

/mnt/THE_BIG_DRIVE/________MAIN_LINUX_BACKUP/
  {host}/
    current/                      ← live rsync mirror of remote ~/
    latest -> hourly/SNAPSHOT     ← atomic symlink to most recent complete snapshot
    hourly/
      2026-03-06_0800/            ← hardlink snapshot (Time Machine-style)
        .snapshot_complete        ← crash-safety marker
        home/ ...                 ← files (hardlinked where unchanged)
      2026-03-06_0700/
      ...
    archives/
      backup_{host}_2026-03-06_0800.tar.gz
      backup_{host}_2026-03-05_2000.tar.gz
      ...
    monthly/
      2026-03/                    ← golden hardlink snapshot
      2026-02/
      ...

Disk Space Characteristics

LayerSpace ModelTypical Cost
current/Full live copy= raw data size
Hourly snapshotHardlinks + deltaLow — only changed files
Tar archiveFull compressed copy~40-60% of raw size
Monthly snapshotHardlinks from currentLow — metadata only for unchanged

Retention Policy

Hourly Snapshots — Tiered Retention

Single oldest-first pass. Each snapshot falls into exactly one tier:

TierWindowKeep RuleDefault
HourlyLast N hoursKeep ALL snapshots720 h (30 days)
DailyLast N daysKeep earliest per day180 days (6 months)
MonthlyLast N monthsKeep earliest per month24 months
ExpiredBeyond N monthsDelete>24 months
Result: ~720 recovery points (hourly) + ~150 daily anchors + 24 monthly anchors — all from one unified pool, with no manual rotation scripts. Hardlinks mean unchanged files share inodes across snapshots — only deltas consume disk.

Tar Archives — Rolling Window

Archives older than TAR_RETENTION_DAYS (5 days) are removed. New archives created only if TAR_INTERVAL_HOURS (4 h) have elapsed.

Monthly Snapshots — Age-Based

Directories named YYYY-MM older than RETENTION_MONTHS (24) are removed by lexicographic comparison to a cutoff string.

REST API

MethodPathDescription
GET /api/status Returns job_running bool + per-host status dict
POST /api/trigger Start a full backup run. Returns 409 if already running.
POST /api/trigger/dry-run Test SSH/rsync connectivity without writing data.
POST /api/cancel/{host} Request cancellation for a specific host. rsync is protected.
GET /api/schedule Returns APScheduler job list with next run time and trigger.

Status Response Schema

{
  "job_running": bool,
  "hosts": {
    "{host}": {
      "status":         "running" | "ok" | "failed" | "cancelled" | "dry_run" | "dry_run_ok",
      "stage":          "rsync" | "hourly" | "tar" | "monthly" | "cleanup" | "done",
      "stage_num":      1..5,
      "stage_total":    5,
      "started":        "ISO-8601",
      "finished":       "ISO-8601",   // absent while running
      "hourly":         {...} | null,
      "tar":            {...} | null,
      "snapshot":       str | null,
      "archives_removed": int
    }
  }
}

Dashboard

Served at / as server-side rendered HTML via Jinja2. Auto-refreshes every 10 seconds when a job is running.

SectionData Shown
Disk usageTotal / used / free (TB) + percent bar for THE_BIG_DRIVE
ScheduleNext run time, interval, retention settings
Per-host cardscurrent/ size, archive count, latest archive, hourly count + tier breakdown, monthly list, last status, symlink target
ControlsRun Backup, Dry Run, Cancel (per-host)

/architecture serves this document as a static file (docs/architecture.html). /architecture-dummies serves the slide presentation.

PostgreSQL + PostgREST

Persistent job history, snapshot metadata, and anomaly detection state. All status was previously in-memory dicts (lost on restart). The database provides historical queries and feeds the dashboard's job history table.

ServiceImagePortRole
backup-postgrespostgres:16-alpine5437 (localhost)Primary data store
backup-postgrestpostgrest/postgrest:v12.0.23005 (localhost)Auto-generated REST API (read-only via backup_anon role)

Schema

TablePurpose
jobsTop-level job runs (type, started, finished, status, dry_run)
job_hostsPer-host detail within a job (stage, metrics, errors)
snapshotsSnapshot records (size, file count, change ratio, entropy, immutability flag)
anomaly_alertsAnomaly detection alerts (Phase 5 — ransomware detection)
The backup pipeline gracefully degrades if PostgreSQL is unavailable — app/db.py logs a warning and returns None, never blocking the backup jobs.

Configuration

VariableDefaultDescription
BACKUP_ROOT/mnt/THE_BIG_DRIVE/________MAIN_LINUX_BACKUPRoot backup directory
HOSTSofficewsl,server,laptopwsl,dellserverComma-separated SSH aliases
SNAPSHOT_INTERVAL_HOURS1Scheduler interval (hours)
TAR_INTERVAL_HOURS4Min gap between tar archives
TAR_RETENTION_DAYS5Days to keep tar archives
TAR_COMPRESSION_LEVEL6gzip level (1-9)
HOURLY_RETENTION_HOURS720Tier 1 window for hourly snapshots (30 days)
DAILY_RETENTION_DAYS180Tier 2 window for daily snapshots (6 months)
KEEP_MONTHLY_SNAPSHOTStrueEnable monthly golden snapshots
SNAPSHOT_DAY_OF_MONTH1Day of month for monthly snapshot
RETENTION_MONTHS24Months to keep monthly snapshots

Hosts

SSH AliasRoleData SizeNotes
officewslPrimary dev machine~96 GBLargest source, WSL2
serverJIRA dev / genesis project~22 GB192.168.10.15
laptopwslSecondary dev~0WSL2, currently empty
dellserverSelf-backup (this machine)~43 GBSSH to localhost from inside container
SSH ControlMaster multiplexing (mounted via ssh_config) reduces connection overhead for repeated rsync/snapshot operations to the same host.

Exclusion Patterns

Applied as --exclude flags to every rsync call. Targets ephemeral data, caches, build artifacts, and large binaries with no recovery value.

CategoryExamples
Caches.cache/, .local/, .npm/, .yarn/, *cache*
Build artifactsnode_modules/, __pycache__/, target/, build/, dist/
Python envsvenv/, env/, .conda/, miniconda3/
Compiled objects*.o, *.so, *.pyc, *.class, *.jar
VM / disk images*.iso, *.vmdk, *.qcow2, *.ova
Version control.git/, .svn/, .hg/
Browser / mail.mozilla/, .firefox/, .thunderbird/
Media / downloadsDownloads/, Music/, Videos/, Pictures/
Temp / logstmp/, *.tmp, *.log, logs/
Pseudo-filesystemsproc/, sys/, dev/, run/