Backup Manager v1.0
Executive Summary
Backup Manager is a FastAPI service running in Docker on dellserver. It replaces cron-driven bash archival scripts with a Python-native scheduler, REST API, and web dashboard. It pulls home directories from all Linux hosts via SSH/rsync and produces three backup layers per host: rolling hourly hardlink snapshots (Time Machine-style), compressed tar archives, and monthly golden snapshots.
The scheduler fires every hour. All hosts are processed in parallel via
ThreadPoolExecutor. Because rsync and gzip release the Python GIL,
the real work runs as concurrent OS processes on separate cores.
BACKUP_ROOT/{host}/. No shared mutable state between host threads.
Deployment Topology
localhost,
using the same rsync pipeline as remote hosts.
Docker Container
Single service defined in docker-compose.yml. restart: always ensures
survival across host reboots.
Volume Mounts
| Host Path | Container Path | Mode | Purpose |
|---|---|---|---|
/mnt/THE_BIG_DRIVE | /mnt/THE_BIG_DRIVE | rw | Backup destination — 21 TB drive |
~/.ssh | /host-ssh | ro | SSH keys for rsync to remote hosts |
/etc/ssh/ssh_config | /etc/ssh/ssh_config | ro | Host SSH config (aliases, ControlMaster) |
Port Mapping
| Host | Container | Service |
|---|---|---|
8085 | 8000 | FastAPI / Uvicorn |
All configuration injected via .env file. TZ=America/New_York
set inline for consistent timestamp formatting.
FastAPI Application
Entry point: app/main.py. Wires three routers and manages the scheduler
lifecycle through FastAPI's lifespan async context manager.
| Router | File | Prefix | Responsibility |
|---|---|---|---|
| dashboard_router | app/routes/dashboard.py | none | HTML dashboard + disk/host info |
| api_router | app/routes/api.py | /api | JSON REST — trigger, status, cancel, schedule |
| browse_router | app/routes/browse.py | /browse | File browser for backup directory tree |
Scheduler
APScheduler BackgroundScheduler with IntervalTrigger.
One job (run_backup_job) fires every SNAPSHOT_INTERVAL_HOURS
(default: 1 hour).
Deduplication Guard
A threading.Lock (_job_lock) prevents concurrent runs.
If the scheduler fires while a previous job is still in progress (e.g., slow network),
the duplicate is dropped with a warning. Same guard applies to manual API triggers.
if not _job_lock.acquire(blocking=False):
logger.warning("Backup job already running — ignoring duplicate trigger")
return False
Backup Pipeline
The pipeline is split into two independent scheduler jobs that run
concurrently. Both spawn one thread per host via ThreadPoolExecutor.
Job 1 — Hourly Snapshots (run_snapshot_job(), every 1 h)
The primary data-pull mechanism. Rsyncs directly from the remote host
into a new snapshot directory, using --link-dest to hardlink unchanged
files against the previous snapshot. Only deltas transfer over the wire.
→ hourly/HH00/
(--link-dest=prev)
After a successful snapshot, current/ is updated to be a symlink
pointing to the latest complete snapshot. This means current/ is always
a consistent, read-only view of the most recent data — no in-place modifications.
rsync -aHu --link-dest={prev_snapshot}/ {host}:~/ {snapshot_dir}/
# Then: current → symlink → hourly/latest-snapshot/
A .snapshot_complete marker is written on success (crash safety).
Partial snapshots from crashes are cleaned up automatically.
The latest and current symlinks are updated atomically via tmp-rename.
Idempotent: if this hour's snapshot already exists with the marker, it is skipped.
Job 2 — Heavy Pipeline (run_heavy_job(), every 4 h)
Reads from current/ (symlink to latest snapshot). No rsync —
data pull is handled entirely by Job 1.
both read current/
retention
■ Protected (never cancelled) ■ Cancellable per-host
Tar archive
Compresses current/ to a .tar.gz (gzip level 6).
Skipped if the most recent archive is less than TAR_INTERVAL_HOURS (4 h) old.
tar -czf backup_{host}_YYYY-MM-DD_HH00.tar.gz -C {current_dir} .
Monthly snapshot
On the 1st of each month (or immediately if no snapshots exist yet for this host),
creates a hardlinked golden copy at monthly/YYYY-MM/.
Cleanup
Removes expired tar archives (older than TAR_RETENTION_DAYS),
removes expired monthly snapshots (older than RETENTION_MONTHS).
Cancellation
Each host has an independent cancel flag (_cancel_flags[host]).
Stages 2-5 check the flag before starting. Mid-stage subprocesses are terminated
via proc.terminate() with a 2-second polling loop. Partial output
(incomplete tar file, incomplete snapshot dir) is cleaned up immediately.
Storage — Directory Layout
/mnt/THE_BIG_DRIVE/________MAIN_LINUX_BACKUP/
{host}/
current/ ← live rsync mirror of remote ~/
latest -> hourly/SNAPSHOT ← atomic symlink to most recent complete snapshot
hourly/
2026-03-06_0800/ ← hardlink snapshot (Time Machine-style)
.snapshot_complete ← crash-safety marker
home/ ... ← files (hardlinked where unchanged)
2026-03-06_0700/
...
archives/
backup_{host}_2026-03-06_0800.tar.gz
backup_{host}_2026-03-05_2000.tar.gz
...
monthly/
2026-03/ ← golden hardlink snapshot
2026-02/
...
Disk Space Characteristics
| Layer | Space Model | Typical Cost |
|---|---|---|
current/ | Full live copy | = raw data size |
| Hourly snapshot | Hardlinks + delta | Low — only changed files |
| Tar archive | Full compressed copy | ~40-60% of raw size |
| Monthly snapshot | Hardlinks from current | Low — metadata only for unchanged |
Retention Policy
Hourly Snapshots — Tiered Retention
Single oldest-first pass. Each snapshot falls into exactly one tier:
| Tier | Window | Keep Rule | Default |
|---|---|---|---|
| Hourly | Last N hours | Keep ALL snapshots | 720 h (30 days) |
| Daily | Last N days | Keep earliest per day | 180 days (6 months) |
| Monthly | Last N months | Keep earliest per month | 24 months |
| Expired | Beyond N months | Delete | >24 months |
Tar Archives — Rolling Window
Archives older than TAR_RETENTION_DAYS (5 days) are removed.
New archives created only if TAR_INTERVAL_HOURS (4 h) have elapsed.
Monthly Snapshots — Age-Based
Directories named YYYY-MM older than RETENTION_MONTHS (24)
are removed by lexicographic comparison to a cutoff string.
REST API
| Method | Path | Description |
|---|---|---|
| GET | /api/status |
Returns job_running bool + per-host status dict |
| POST | /api/trigger |
Start a full backup run. Returns 409 if already running. |
| POST | /api/trigger/dry-run |
Test SSH/rsync connectivity without writing data. |
| POST | /api/cancel/{host} |
Request cancellation for a specific host. rsync is protected. |
| GET | /api/schedule |
Returns APScheduler job list with next run time and trigger. |
Status Response Schema
{
"job_running": bool,
"hosts": {
"{host}": {
"status": "running" | "ok" | "failed" | "cancelled" | "dry_run" | "dry_run_ok",
"stage": "rsync" | "hourly" | "tar" | "monthly" | "cleanup" | "done",
"stage_num": 1..5,
"stage_total": 5,
"started": "ISO-8601",
"finished": "ISO-8601", // absent while running
"hourly": {...} | null,
"tar": {...} | null,
"snapshot": str | null,
"archives_removed": int
}
}
}
Dashboard
Served at / as server-side rendered HTML via Jinja2.
Auto-refreshes every 10 seconds when a job is running.
| Section | Data Shown |
|---|---|
| Disk usage | Total / used / free (TB) + percent bar for THE_BIG_DRIVE |
| Schedule | Next run time, interval, retention settings |
| Per-host cards | current/ size, archive count, latest archive, hourly count + tier breakdown, monthly list, last status, symlink target |
| Controls | Run Backup, Dry Run, Cancel (per-host) |
/architecture serves this document as a static file
(docs/architecture.html). /architecture-dummies
serves the slide presentation.
PostgreSQL + PostgREST
Persistent job history, snapshot metadata, and anomaly detection state. All status was previously in-memory dicts (lost on restart). The database provides historical queries and feeds the dashboard's job history table.
| Service | Image | Port | Role |
|---|---|---|---|
backup-postgres | postgres:16-alpine | 5437 (localhost) | Primary data store |
backup-postgrest | postgrest/postgrest:v12.0.2 | 3005 (localhost) | Auto-generated REST API (read-only via backup_anon role) |
Schema
| Table | Purpose |
|---|---|
jobs | Top-level job runs (type, started, finished, status, dry_run) |
job_hosts | Per-host detail within a job (stage, metrics, errors) |
snapshots | Snapshot records (size, file count, change ratio, entropy, immutability flag) |
anomaly_alerts | Anomaly detection alerts (Phase 5 — ransomware detection) |
app/db.py logs a warning and returns None,
never blocking the backup jobs.
Configuration
| Variable | Default | Description |
|---|---|---|
BACKUP_ROOT | /mnt/THE_BIG_DRIVE/________MAIN_LINUX_BACKUP | Root backup directory |
HOSTS | officewsl,server,laptopwsl,dellserver | Comma-separated SSH aliases |
SNAPSHOT_INTERVAL_HOURS | 1 | Scheduler interval (hours) |
TAR_INTERVAL_HOURS | 4 | Min gap between tar archives |
TAR_RETENTION_DAYS | 5 | Days to keep tar archives |
TAR_COMPRESSION_LEVEL | 6 | gzip level (1-9) |
HOURLY_RETENTION_HOURS | 720 | Tier 1 window for hourly snapshots (30 days) |
DAILY_RETENTION_DAYS | 180 | Tier 2 window for daily snapshots (6 months) |
KEEP_MONTHLY_SNAPSHOTS | true | Enable monthly golden snapshots |
SNAPSHOT_DAY_OF_MONTH | 1 | Day of month for monthly snapshot |
RETENTION_MONTHS | 24 | Months to keep monthly snapshots |
Hosts
| SSH Alias | Role | Data Size | Notes |
|---|---|---|---|
officewsl | Primary dev machine | ~96 GB | Largest source, WSL2 |
server | JIRA dev / genesis project | ~22 GB | 192.168.10.15 |
laptopwsl | Secondary dev | ~0 | WSL2, currently empty |
dellserver | Self-backup (this machine) | ~43 GB | SSH to localhost from inside container |
ssh_config) reduces
connection overhead for repeated rsync/snapshot operations to the same host.
Exclusion Patterns
Applied as --exclude flags to every rsync call. Targets ephemeral data,
caches, build artifacts, and large binaries with no recovery value.
| Category | Examples |
|---|---|
| Caches | .cache/, .local/, .npm/, .yarn/, *cache* |
| Build artifacts | node_modules/, __pycache__/, target/, build/, dist/ |
| Python envs | venv/, env/, .conda/, miniconda3/ |
| Compiled objects | *.o, *.so, *.pyc, *.class, *.jar |
| VM / disk images | *.iso, *.vmdk, *.qcow2, *.ova |
| Version control | .git/, .svn/, .hg/ |
| Browser / mail | .mozilla/, .firefox/, .thunderbird/ |
| Media / downloads | Downloads/, Music/, Videos/, Pictures/ |
| Temp / logs | tmp/, *.tmp, *.log, logs/ |
| Pseudo-filesystems | proc/, sys/, dev/, run/ |