Homelab Backup Audit: The 15-Minute Checklist Most Self-Hosted Operators Skip
If your last verified restore was "sometime last year," you have silently-broken backups. I know because I've been there — staring at a terminal at 2am, realizing my "perfect" Proxmox backup job had been writing empty SQL files for six months. The pattern is universal across Proxmox, Docker, Unraid, and TrueNAS stacks: backups get configured, work for a while, then fail silently when something upstream changes (volume rename, container migration, disk swap). The audit below catches them in 15 minutes against your actual inventory.
By Dale Weaver · Updated May 31, 2026. Tested against my own Proxmox + Docker + Unraid + TrueNAS homelab. The framework below is the basis of The Operator's Cockpit — a 40-prompt pack for solo homelab operators.
Why every long-running homelab eventually breaks backups silently
Backup software runs fine. Logs say success. Storage is healthy. Until you try to restore — and discover one of these:
- The Postgres container migrated from VM-101 to LXC-205 six months ago, but the pg_dump script still points at the old hostname. Result: 6 months of backups are empty SQL files. I found this one the hard way during a "routine" recovery test. Spoiler: it wasn't routine.
- Vaultwarden's bind-mount was renamed during a Docker compose refactor. Backups still run on the old path (which is now empty). The restore is technically valid — but you've been backing up nothing.
- TrueNAS replication job to the offsite NAS is "working," but the offsite NAS ran out of space 90 days ago and silently rejects new snapshots. The local job logs success because it doesn't verify the remote side.
- Plex metadata DB is on a separate Docker volume that was added after the backup script was written. It's never been backed up. You'll only find out the day Plex's transcoder corrupts the watch history — and trust me, losing your curated playlists hurts more than you'd think.
None of these surface in normal logs. The backup tool reports success because the tool's job did succeed — it just operated on the wrong target, the wrong path, or a broken remote. As a fellow operator once told me: "Your backup software isn't the problem. Your assumptions are." Catching this requires a state-level audit that compares what's running against what's being backed up.
The 5-step audit
Step 1 — Inventory your stateful services
Make a flat list. One service per row. Use this format:
name, type, host, last_backup_iso, last_verified_restore_iso Postgres-prod, database, vm-101, 2026-05-28, 2026-03-15 Vaultwarden, password-mgr, vm-103, 2026-05-28, Caddy-config, reverse-proxy, vm-101, 2026-05-28, Plex-metadata, media-db, lxc-201, , Home-Assistant, home-automation, lxc-204, 2026-05-29, 2026-04-02 TrueNAS-replication, snapshot-job, nas-01, 2026-05-30, Nextcloud-data, file-storage, vm-105, 2026-05-29, 2026-02-08
The last_verified_restore_iso column is the one most operators leave blank. That's the column that catches the silent failures. When I first ran this audit on my own setup, I had four blank cells in that column. Embarrassing, but fixable.
Step 2 — Flag any service with no backup or no verified restore in 90+ days
Apply these rules in order:
- RED FLAG: last_backup_iso is empty or older than 7 days. The service is unprotected.
- YELLOW FLAG: last_backup_iso is recent but last_verified_restore_iso is empty or older than 90 days. The backup exists but you don't know if it works.
- GREEN: recent backup AND verified restore within 90 days.
Most homelabs I've audited have 30-60% red or yellow flags after this single pass — without realizing it. My own was at 40%. Not great, but knowing is half the battle.
Step 3 — Cross-check the backup target paths against what's actually running
For each backed-up service, grep your backup config for the actual volume / bind-mount / database path. Compare against the live container or VM's mount table. Look for:
- Path drift: backup config references
/srv/postgres-databut the container actually mounts/srv/postgres-prod-data. The backup runs against an empty (or stale) directory. - Stopped service drift: container was stopped 3 months ago but backup config still includes it. Wastes storage and obscures the active set.
- New service gaps: service was added in the last 90 days but never added to the backup config. Completely unprotected.
Step 4 — Test one restore. Just one.
Pick the service you'd most regret losing. Restore the most recent backup to a fresh test VM / container. Compare against the live state. Time-box this to 30 minutes — if you can't get a clean restore in 30 minutes for the most critical service, you have a real problem.
For databases: dump the test restore and diff schema + row counts against prod. For file storage: hash-compare a sample directory. For configuration: check that the restored config actually starts the service without errors. I once spent a full Saturday verifying my Home Assistant backup — turned out the database dump was incomplete. Caught it because I actually checked.
Step 5 — Set a 90-day repeat cadence
Calendar entry, every 90 days. Re-run steps 1-4. Backups erode silently as your stack evolves; the only defense is periodic re-verification. The audit takes 15 minutes once you have the format above; the real cost is the discipline of actually doing it. Set a recurring reminder. I use a sticky note on my monitor, too — analog still works.
What this catches that backup tools alone don't
The audit above is a state diff — it compares what's running against what's being backed up. Backup tools report on their own job success; they don't know whether they're targeting the right thing. Specifically, this audit catches:
- Path/hostname drift after migrations or refactors
- Silent remote-storage failures (replication to a full NAS, etc.)
- Newly-added services that were never added to the backup config
- Stopped/deprecated services that still consume backup storage
- Backups that "succeed" but contain stale or empty content
Free 5-prompt sample of the audit pack
I built The Operator's Cockpit — 40 multi-step LLM prompts for solo homelab operators. The free sample is the 5 most-used prompts including the full audit framework above as a copy-paste-ready prompt you can run against your own inventory in an LLM (Claude, GPT-4, Gemini).
Want the audit as a ready-to-run LLM prompt? The free 5-prompt sample pack includes the full backup-audit prompt — paste your service inventory, get a RED/YELLOW/GREEN diagnosis in 15 seconds. Get the free 5-prompt sample → Full 40-prompt pack: $29 one-time on Gumroad. No subscription.
Common findings (real exam
ples)From audits I've run on my own and others' homelabs:
- Postgres in Docker with named volume: backup script targeted
/var/lib/docker/volumes/pg_data. Volume was renamed during a compose refactor 4 months prior. Backups had been silently dumping an empty directory. - TrueNAS replication to backblaze: job reported success daily. B2 bucket was at 99% capacity and silently rejecting new snapshots for 60 days. Caught by checking the B2 dashboard, not by checking TrueNAS logs.
- Home Assistant config: backup script captured
/config. After migrating from Docker to Supervised, the config moved to/usr/share/hassio/homeassistant. Backup script kept running, kept reporting success, kept backing up nothing. - Vaultwarden: sqlite DB was backed up. The
attachments/directory was NOT included in the backup. Two months of attached files would have been lost on restore.
Every single one of these was an "everything is fine" signal in the tool's own logs. The audit caught them.
The 30-second TL;DR
- List every stateful service with last_backup and last_verified_restore dates
- Flag anything older than 7 days (backup) or 90 days (verified restore)
- Cross-check backup target paths against the live mount tables
- Test one restore — for the service you'd most regret losing
- Calendar-repeat every 90 days
If you do this once and find zero issues, your homelab is in better shape than 95% of self-hosted setups. If you find 3+ issues, this audit just saved you a real bad day.
Frequently Asked Questions
What's the minimum backup-restore frequency for a homelab?
Backups daily, verified restores quarterly. The verified-restore cadence is the one most operators get wrong — they assume the backups work and only find out otherwise during an actual disaster.
Does this audit work for Proxmox-only setups?
Yes. The framework is host-agnostic — replace the service-type column with VM/CT IDs and the path-drift check becomes a check of vzdump targets vs. live VM disks. The Operator's Cockpit pack includes a Proxmox-specific variant of the audit prompt.
What if I'm using Borg, Restic, or Kopia?
Same framework — those tools handle the "is the backup running" question well. They don't handle the "is the backup targeting the right thing" question. The audit catches that gap.
How does Operator's Cockpit differ from existing backup tools?
Existing tools (Borg/Restic/Kopia, Proxmox Backup Server, etc.) execute the backup. The Cockpit pack is a set of LLM prompts that audit your configuration — what's being backed up vs. what should be. It's a state-diff tool, not a backup tool. You still need a backup tool.
Sources & Further Reading
Beyond the Audit: Proactive Backup Strategies for Student Homelabs
While a 15-minute audit is crucial for catching immediate issues, a robust backup strategy for your homelab, especially as a college student, needs to extend beyond reactive checks. Think proactively about what data is truly irreplaceable and what the academic cost would be if it vanished. This isn't just about your game server saves or media library; it's about your senior thesis, your machine learning project's training data, your research paper drafts, or the complex code for your capstone project. Losing these could mean redoing months of work, missing deadlines, or even impacting your graduation.
For students balancing studies with homelab tinkering, implementing a tiered backup approach is highly effective. Start by identifying your most critical academic data and segregating it. For instance, code repositories might go to a Git service (like GitHub or GitLab) *and* be backed up locally, while research papers could be synced to cloud storage (like OneDrive or Google Drive) *and* included in your regular homelab backups. This layering provides multiple recovery points, dramatically reducing the risk of total data loss.
Consider the "3-2-1 rule" adapted for your student budget and setup: at least three copies of your data, stored on two different types of media, with one copy off-site. For instance, your Proxmox VM backups might be on your NAS (media 1), a copy could be synced to an external hard drive you keep off-campus (media 2, off-site), and the critical files within those VMs could also be mirrored to a cloud service. This multi-pronged approach ensures that even if your entire homelab goes down due to a power surge, theft, or a critical hardware failure, your academic progress remains secure.
Automating Verification: From Manual Checks to Scripted Confidence
A manual 15-minute audit is a great starting point, but for true peace of mind and to reduce human error, you'll want to move towards automating as much of your backup verification as possible. This doesn't mean becoming a shell scripting guru overnight, but rather leveraging simple scripts to perform routine checks that go beyond merely confirming a backup job completed successfully. A "successful" log entry doesn't guarantee data integrity; a corrupted backup archive is still a successful file creation.
Start with basic integrity checks. For Proxmox, this might involve running vma_info on a backup archive or even scripting a small restore to a temporary location to ensure the VM boots. For Docker, verify that a restored container's data volume contains the expected files or that a specific service starts correctly. While a full restore of every backup might be impractical, a rotating verification schedule – where you randomly pick one critical VM or container backup each week to partially restore and test – can catch latent issues before they become critical.
For students new to scripting, tools like ScholarNet AI can be invaluable for generating initial script drafts or explaining complex commands, helping you build robust verification routines without hours of trial and error. You could prompt it for "a bash script to list contents of the latest Proxmox backup for VM 101
