Skip to content

🏗️ Architectural Standard 2.0: Local Brain, Remote Storage

Status: Draft / In-Progress Date: 2026-01-24 Applies To: All Compute Nodes (memory-alpha, starfleet-compute, risa-recreation)

🚨 The Problem (Why we are changing)

The previous "All-NAS" architecture caused critical stability issues: 1. Fragility: A momentary network glitch causes active containers (Dashboard, Paperless) to crash or reset. 2. Permission Hell: Databases (Redis, Postgres) running over NFS struggle with file locking and user mapping, causing "500 Server Errors". 3. Latency: UI interactions were sluggish due to constant network I/O.

🏛️ The Solution: Hybrid Architecture

We are adopting a "Local Brain, Remote Storage" model.

1. Local Brain (Active I/O)

All "hot" files must reside on the compute node's local storage (NVMe/SSD). * Path: /opt/docker_data/[service_name] * Contents: * Configuration Files (.yaml, .json, .env) * Application Databases (Redis, Postgres, SQLite) * Docker Volumes * Benefits: Instant I/O, simple permissions, high reliability.

2. Remote Storage (Bulk Data)

Only "cold" or "large" files remain on the NAS (TrueNAS). * Path: /mnt/infra_storage or /mnt/media * Contents: * Media Libraries (Movies, TV, Music) * Document Archives (PDFs) * Backups * Benefits: Centralized storage, easy expansion.

🔐 Media Permission Standard (UID 3000)

To ensure seamless integration with ZFS/NFS network storage, all media-consuming containers (Plex, Audiobookshelf, Lidarr, Navidrome, etc.) MUST adhere to the following:

  1. PUID/PGID: Set to 1000:3000 (User: vivianl, Group: media).
  2. Rationale: The NAS uses specific ACLs where GID 3000 is the designated media group owner. Standardizing services to this pair ensures atomic moves and avoids Permission Denied errors.
  3. Inheritance: Use chmod -R 775 on parent directories to ensure proper group write access for automation.

🖖 The "Wormhole" Proxy Standard

All external services must follow the three-tier synchronization protocol: 1. Tier 1 (Oracle Terminator): Public entry. Handles SSL termination and host-header routing to Risa (Port 80). 2. Tier 2 (Risa Gateway): Master router. Internal routing via http:// (to avoid redirect loops with the Oracle Terminator). Must use the verified path: /opt/docker_data/gateway/caddy/Caddyfile. 3. Tier 3 (Service Node): The compute node (Starfleet, Risa, or Holodeck).

🛠️ Deployment Verification Protocol

Before marking a service as "Live," the agent must: 1. Verify Mounts: Run docker inspect to ensure the host path providing the config is correctly identified. 2. Oracle Sync: Add the new subdomain to the Oracle Caddyfile using the oracle_key. 3. Panic Check: Tail docker logs for 30s to catch router panics (e.g., empty BASEURL strings). 4. Signal Path Test: Execute curl tests from Tier 2 to Tier 3, then Tier 1 to Tier 2.

🔄 Migration Plan

Phase 1: memory-alpha (Immediate)

  • Dashboard: Move config to /opt/docker_data/master-dashboard.
  • Paperless: Move Redis/DB to local Docker volumes. Documents stay on NAS.
  • Scripting: Update deploy.sh to sync configs to /opt instead of /mnt.

Phase 2: starfleet-compute (Next)

  • Plex/Arr: Configs move to local SSD /opt/docker_data. Media stays on /mnt/media.
  • Paperless: DB moved local.

🛡️ Backup Strategy

Since configs are now local, we lose the "automatic" safety of the NAS. * New Requirement: A nightly cron job must rsync /opt/docker_data -> /mnt/infra_storage/backups/configs.

✅ Success Criteria

  1. Dashboard survives a network disconnect.
  2. Paperless loads without 500 errors.
  3. Plex continues to play media from the NAS.