A GitOps-managed home lab built around an Odroid-H4 Ultra that serves double duty as a NAS and a single-node k3s cluster, with an ARM inference fleet. Infrastructure is defined as code: Ansible stands hosts up, Argo CD runs everything inside the cluster from this git repo, and a two-tier storage model (hot NVMe + two cold SATA RAID 1 mirrors) keeps data safe.
- k3s, not full Kubernetes — the H4 is also the NAS. k3s runs as a single systemd
service alongside Samba/NFS and leaves most of the box free. Traefik is the default
ingress; workloads use
networking.k8s.io/v1 Ingress, not OpenShift Routes. - Argo CD, not imperative ops — change the cluster by editing git and opening PRs.
Argo reconciles with
selfHealon, so drift reverts and rollback isgit revert. Neverkubectl applytomaindirectly. - Two cold tiers — a fast 4 TB NVMe for etcd/PVs/live NAS (OS on the 256 GB eMMC), and two SATA RAID 1 mirrors (8 TB primary + ~5.45 TB secondary) for backups and cold storage. See docs/ARCHITECTURE.md.
| Path | What it is |
|---|---|
ansible/ |
Host provisioning: storage, k3s install, backups, Argo bootstrap, password rotation |
gitops/ |
What Argo deploys — bootstrap/ (app-of-apps), apps/, workloads/ |
docs/ |
Architecture, hardware, runbook, security, AI inference, service catalog, updates |
scripts/ |
One-shot helpers (enable-winrm.ps1, lab-check.sh, flannel FDB service) |
CLAUDE.md |
Operating rules — read before touching anything |
| App | Namespace | Notes |
|---|---|---|
| Immich | immich |
Photo server + Postgres + Redis + ML; library on cold RAID at /mnt/cold-8t/immich |
| LiteLLM gateway | ai-gateway |
Load-balances chat across both Ollama instances (simple-shuffle) |
| Ollama | ai-gateway |
qwen2.5:3b on opi5pro-1, pinned via nodeSelector |
| Ollama-2 | ai-gateway |
qwen2.5:3b on opi5pro-2, pinned via nodeSelector |
| kube-prometheus-stack | monitoring |
Prometheus (30d/50GB), Grafana, Alertmanager, node-exporter on all nodes |
| external-secrets | external-secrets |
Pulls secrets from Vault (KV v2 at secret/lab/) |
| cert-manager | cert-manager |
TLS certificate management |
| Argo CD | argocd |
GitOps controller — manages all of the above |
| Service | Host | IP |
|---|---|---|
| HashiCorp Vault v1.21.0 | rpi5 | 192.168.1.128 |
| Mosquitto MQTT | opi-zero2w-2 | 192.168.1.188 |
| Pi-hole DNS (primary) | octopi | 192.168.1.148 |
| Samba / NFS (NAS) | H4 (host) | 192.168.1.160 |
| URL | Service |
|---|---|
grafana.apps.lab.home.arpa |
Grafana |
argocd.apps.lab.home.arpa |
Argo CD |
immich.apps.lab.home.arpa |
Immich |
*.apps.lab.home.arpa |
→ 192.168.1.160 (Traefik) |
| Host | Ansible name | IP | Role |
|---|---|---|---|
| Odroid-H4 Ultra | h4-core | 192.168.1.160 | k3s server + NAS |
| Orange Pi 5 Pro #1 | opi5pro-1 | 192.168.1.168 | k3s agent, Ollama inference |
| Orange Pi 5 Pro #2 | opi5pro-2 | 192.168.1.172 | k3s agent, Ollama-2 inference |
| Raspberry Pi 5 | rpi5 | 192.168.1.128 | Vault |
| N150 mini PC #1 | n150-1 | 192.168.1.9 | Windows (WinRM managed, sleep disabled) |
| N150 mini PC #2 | n150-2 | 192.168.1.171 | Windows (WinRM managed, sleep disabled) |
| N150 mini PC #3 | n150-3 | 192.168.1.176 | Windows HTPC (WinRM managed, sleep disabled) |
| OPi Zero 2W #2 | opi-zero2w-2 | 192.168.1.188 | MQTT broker |
| OctoPi (RPi 3B #2) | octopi | 192.168.1.148 | Pi-hole DNS (needs Bookworm upgrade) |
| Odroid XU3 | xu3-1 | 192.168.1.64 | Build agent (flagged unstable) |
Full step-by-step is in docs/RUNBOOK.md.
- Prereqs — Ubuntu 22.04 on eMMC, NVMe + SATA disks ready, SSH key access, DNS
records for
api.lab.home.arpaand*.apps.lab.home.arpa→ 192.168.1.160. - Set your repo URL — replace the
repoURLingitops/bootstrap/root-app.yaml. - Bootstrap:
ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/storage.yml --check ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/storage.yml ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/k3s.yml ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/backup.yml ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/argocd.yml
- Verify —
kubectl get nodes, then openhttps://argocd.apps.lab.home.arpa.
Add a workload: add a directory under gitops/workloads/ and an Application in
gitops/apps/, then merge to main. Argo deploys it within ~30 seconds.
Change anything: edit git, never poke the cluster directly. The sample app is a working template.
Secrets: store in Vault under secret/lab/<name>, then create an ExternalSecret in
the workload namespace. See gitops/workloads/immich/external-secret.yaml for an example.
See docs/UPDATES.md for the full update workflow. Short version:
| Layer | Update | Rollback |
|---|---|---|
| Container image | Renovate PR → merge → Argo syncs | git revert HEAD && git push (~60s) |
| k3s binary | make update-k3s after Renovate PR |
Re-run with previous version |
| OS packages | make update-vms (drain → apt → uncordon) |
Restore from backup |
| Pi-hole | make update-pihole (secondary first) |
Re-run pihole -up |
| Vault | Upgrade via apt; run make check-vault after |
Restart + unseal |
| Windows nodes | Ansible windows-bootstrap.yml |
Manual |
- Ansible Vault:
immich_db_password,lab_user_password_hash,windows_ansible_password - Vault KV v2:
secret/lab/immich,secret/lab/grafana,secret/lab/argocd-deploy-key - SSH password auth disabled on all Linux hosts; root locked
- Vault auto-unseal via systemd service on rpi5 (keys file on-disk,
root:root 0400) - Never commit:
/etc/restic/password,ansible/.vault_pass, any kubeconfig or k3s token
See docs/SECURITY.md for the full security model.
- Never
mkfs/wipefsthe cold disks (/dev/md0,/dev/md1) - Never run
restic forget/pruneby hand — retention is handled by backup timers only - Never stop
smbd,nfs,backup-nas, orbackup-etcd - Before any hot-tier storage change: confirm last backup succeeded
- Rotate octopi Pi-hole credentials (currently plaintext)
- Flash octopi to Raspberry Pi OS Bookworm → upgrade to Pi-hole v6
- Move photo library (~1.3TB) into
/mnt/cold-8t/immichso Immich can serve it
- Find rpi4b (or replacement) → bootstrap OpenLDAP
- Vault TLS — currently plain HTTP; wire cert-manager before exposing beyond LAN
- LDAP integration with Immich SSO once OpenLDAP is up
- n150-1/2 → OVMS Intel iGPU for embeddings (
embeddingsmodel in LiteLLM config already wired) - Offsite restic backup target
- Pin Immich image tag (currently
:releasefloating) — Renovate will open the first PR