🏗️ Architectural Standard 2.0: Local Brain, Remote Storage
Status: Draft / In-Progress Date: 2026-01-24 Applies To: All Compute Nodes (memory-alpha, starfleet-compute, risa-recreation)
🚨 The Problem (Why we are changing)
The previous "All-NAS" architecture caused critical stability issues: 1. Fragility: A momentary network glitch causes active containers (Dashboard, Paperless) to crash or reset. 2. Permission Hell: Databases (Redis, Postgres) running over NFS struggle with file locking and user mapping, causing "500 Server Errors". 3. Latency: UI interactions were sluggish due to constant network I/O.
🏛️ The Solution: Hybrid Architecture
We are adopting a "Local Brain, Remote Storage" model.
1. Local Brain (Active I/O)
All "hot" files must reside on the compute node's local storage (NVMe/SSD).
* Path: /opt/docker_data/[service_name]
* Contents:
* Configuration Files (.yaml, .json, .env)
* Application Databases (Redis, Postgres, SQLite)
* Docker Volumes
* Benefits: Instant I/O, simple permissions, high reliability.
2. Remote Storage (Bulk Data)
Only "cold" or "large" files remain on the NAS (TrueNAS).
* Path: /mnt/infra_storage or /mnt/media
* Contents:
* Media Libraries (Movies, TV, Music)
* Document Archives (PDFs)
* Backups
* Benefits: Centralized storage, easy expansion.
🔐 Media Permission Standard (UID 3000)
To ensure seamless integration with ZFS/NFS network storage, all media-consuming containers (Plex, Audiobookshelf, Lidarr, Navidrome, etc.) MUST adhere to the following:
- PUID/PGID: Set to
1000:3000(User:vivianl, Group:media). - Rationale: The NAS uses specific ACLs where GID 3000 is the designated media group owner. Standardizing services to this pair ensures atomic moves and avoids
Permission Deniederrors. - Inheritance: Use
chmod -R 775on parent directories to ensure proper group write access for automation.
🖖 The "Wormhole" Proxy Standard
All external services must follow the three-tier synchronization protocol:
1. Tier 1 (Oracle Terminator): Public entry. Handles SSL termination and host-header routing to Risa (Port 80).
2. Tier 2 (Risa Gateway): Master router. Internal routing via http:// (to avoid redirect loops with the Oracle Terminator). Must use the verified path: /opt/docker_data/gateway/caddy/Caddyfile.
3. Tier 3 (Service Node): The compute node (Starfleet, Risa, or Holodeck).
🛠️ Deployment Verification Protocol
Before marking a service as "Live," the agent must:
1. Verify Mounts: Run docker inspect to ensure the host path providing the config is correctly identified.
2. Oracle Sync: Add the new subdomain to the Oracle Caddyfile using the oracle_key.
3. Panic Check: Tail docker logs for 30s to catch router panics (e.g., empty BASEURL strings).
4. Signal Path Test: Execute curl tests from Tier 2 to Tier 3, then Tier 1 to Tier 2.
🔄 Migration Plan
Phase 1: memory-alpha (Immediate)
- Dashboard: Move config to
/opt/docker_data/master-dashboard. - Paperless: Move Redis/DB to local Docker volumes. Documents stay on NAS.
- Scripting: Update
deploy.shto sync configs to/optinstead of/mnt.
Phase 2: starfleet-compute (Next)
- Plex/Arr: Configs move to local SSD
/opt/docker_data. Media stays on/mnt/media. - Paperless: DB moved local.
🛡️ Backup Strategy
Since configs are now local, we lose the "automatic" safety of the NAS.
* New Requirement: A nightly cron job must rsync /opt/docker_data -> /mnt/infra_storage/backups/configs.
✅ Success Criteria
- Dashboard survives a network disconnect.
- Paperless loads without 500 errors.
- Plex continues to play media from the NAS.