A modern Proxmox Docker architecture with disposable VMs, VirtIO-FS, and ZFS
Pangram verdict · v3.3
We believe that this document is fully human-written
AI likelihood · overall
HumanArticle text · 1,846 words · 6 segments analyzed
Published: 2026-06-07 , Revised: 2026-06-13 TL;DR The Linux kernel's security model is constantly evolving. In 2026, my Docker-in-LXC nesting became increasingly fragile and needed a replacement. Here I describe a state of the art architecture for Proxmox. The post outlines deploying lightweight VMs via cloud-init linked clones, isolating services in rootless Docker namespaces, and using VirtIO-FS with native VFS idmapped mounts for resource efficient ZFS storage passthrough. Motivation Back in 2021, I documented my approach to running Docker inside unprivileged LXC containers on Proxmox. At that time, my hypervisor was quite resource-restricted (a 2013 Xeon with 32GB of memory). This made unprivileged LXC the obvious choice. It worked well for years (and it continues to work, largely - read on). The Linux kernel has moved on. With the adoption of cgroup v2, stricter AppArmor profiles, and tighter UserNS restrictions, running Docker inside LXC slowly turns into a battle against the kernel's security model. In 2023, I wrote about a successor system architecture for running Mastodon in rootless Docker behind an Nginx proxy. I have used this setup extensively for cloud VMs and at work. The system concept utilizes the improved isolation of VMs while preserving resource efficiency of isolating individual services within unprivileged, rootless Linux namespaces. The logic behind this is that not every 20MB application requires the memory overhead of its own dedicated VM. Conversely, mixing unrelated services (like Nextcloud and Immich) into the same Docker daemon is a security anti-pattern. If a dependency chain attack compromises a library in your photo indexing app, that attacker should not also gain read/write access to your entire Nextcloud document repository. Enforcing the principle of separation of concerns becomes even more critical today because of the rise of dependency chain attacks. What those previous posts did not cover is how to use this new architecture with the hypervisor's storage layer.
If your storage sits directly on the Proxmox Hypervisor (e.g. attached as a JBOD), there is a need for a clean way to share specific paths into isolated VMs. In a homelab, VirtIO-FS bind-mounts are preferable to NFS sharing. NFS is often slow and difficult to configure with user and permission mapping. Imagine, for example, one system needs read/write access (e.g., Nextcloud handling automatic photo uploads from client phones), while another system strictly needs read-only access (e.g., Immich indexing those same photos), all with custom uid- and gid-mapping for each services, - this is the sweet spot I am exploring with this blog post. This guide outlines the architecture concept from the ZFS and Hypervisor configuration, up to the application layer. To understand this setup, a grasp of persistent versus ephemeral data is good to have, along with an understanding of how to layer down these components from the hypervisor into a nested rootless Docker environment. If you don't know what I am talking about, you may still find individual parts interesting for copying or best practice. Pick what is relevant to you. Specifically, I address the following documentation and concept gaps: Ephemeral storage: ZFS snapshots can easily grow in size when used with Docker. I show how to separate the persistent data from ephemeral, disposable Docker OS Image content found in ~/.local/share/docker. This separation keeps backups small. VFS ID-mapping: Network file systems can cause UID/GID mismatches and add network overhead. With VirtIO-FS, I use the Linux kernel's Virtual File System to translate the hypervisor's UID to the guest's unprivileged UID. This avoids exposing the host file structure. I utilize the X-mount.idmap fstab option for this. Documentation on this specific implementation is not easy to find. It builds upon the idmapped mounts feature introduced by Christian Brauner in Linux 5.12 1 and its later integration with util-linux v2.39 into the standard mount utility 2. Linked clones: I initially considered template creation unnecessary. A manual VM installation only takes about 30 minutes. However, treating VMs as disposable cattle provides a structurally better architecture. It saves disk space, prevents configuration drift, and minimizes human error. And it works perfectly with ZFS cloning.
I will show the commands to create a dedicated tooling VM, how to customize a .qcow2 image, and the steps to deploying an IP-agnostic template via Cloud-Init in Proxmox. Isolation: Combining systemd rootless user namespaces with discrete VirtIO-FS mounts allows running multiple applications on a single VM. These applications remain isolated and cannot access each other's data or docker environments. I will provide some guidance on how to decide when to use Docker inside rootless namespaces versus one dedicated VM for a single service. Architecture Overview# You may have seen the core proxy concept in my previous posts. We use a standard setup of Nginx on the VM host as a central reverse proxy. It forwards traffic through localhost to individual services, which run in their own rootless Docker namespaces. I consider this the typical economical setup. The resources of a single VM are shared with multiple largely isolated services. At the base, we have a central ZFS pool that hosts all data. I use a common setup design with 3 hardware pools: rpool/bpool - my Proxmox bootpool, consisting of a mirror of two small SSD drives tank_ssd - my VM/service drive, a ZFS mirror of two SSDs. All fast data goes here, including VMs, logs, temporary file folders etc. tank_hdd - my data drive for the slow and big data. Currently, this is a 6x8TB raidz2. [ Web/LAN ] | 443 / 80 (Port Forwarding) | +-----------------------------------------------------------------------------------------------------------+ | Proxmox Hypervisor v | | | | +-----------------------------------------+ +-----------------------------------------+ | | | VM 1: Heavyweight (Single-Tenant) | | VM 2: Lightweight (Multi-Tenant) | | | | Pattern: Rootful Docker | | Pattern: Rootless User Namespaces | | | | | | | | | | [ Nginx / SSL ] | | [ Nginx / SSL ] | | | | | | | | | | | | | 127.0.0.1:8080 | | 127.0.0.1:8081
127.0.0.1:8082 | | | | v | | v v | | | | +-----------------------------+ | | +-------------+ +-------------+ | | | | | systemd: docker.service | | | | User: | | User: | | | | | | (Daemon runs as root) | | | | funkwhale | | immich | | | | | | | | | | UID: 2001 | | UID: 2002 | | | | | | +-------------------------+ | | | | | | | | | | | | | Nextcloud Container | | | | | +---------+ | | +---------+ | | | | | | | App UID: 33 (www-data) | | | | | | Docker | | | | Docker | | | | | | | +-------------------------+ | | | | | App:1000| | | | App:1000| | | | | | +--------------|--------------+ | | | +----|----+ | | +----|----+ | | | | | | | | +------|------+ +------|------+ | | | | | | | | | | | | | [ /srv/nextcloud/data ] | | [ /srv/media/fw ] [ /srv/media/im ] | | | +-------------------|---------------------+ +------------|----------------|-----------+ | | | VirtIO-FS | VirtIO-FS | VirtIO-FS | | v v v | | ........................................................................................................ | | : ID-Mapping Translation Layer (Linux Kernel on Hypervisor) : | | : : | | : X-mount.idmap=b:1005:33:1 X-mount.idmap=b:1005:2100999:1 ...b:1005:2200999:1 : | | :....................|..................................................|................|.............:
| | | | | | | v v v | | +---------------------------------------------------------------------------------------------------+ | | | ZFS Storage (tank_ssd & tank_hdd) | | | | | | | | [ /secure ] (Persistent / Encrypted / Backed Up) | | | | - /media/secure/nextcloud/data (Owned by Hypervisor UID 1005) | | | | - /media/secure/00_Alex/Music (Owned by Hypervisor UID 1005) | | | | | | | | [ /ephemeral ] (Disposable / Unencrypted / No Backup) | | | | - /vm-200-disk-1 (/var/lib/docker) | | | | - /vm-201-disk-1 (/mnt/ephemeral_docker/funkwhale) | | | +---------------------------------------------------------------------------------------------------+ | +-----------------------------------------------------------------------------------------------------------+ Now, for deploying VMs and Services, you have to make a decision whether to use the Heavyweight type: A single service in a single VM; or the Lightweight variant: A single VM with multiple nested rootless users all with their own nested docker systemd's. If you want some guidance on that decision, read what I wrote behind the dropdown below. Decide what type of VM-Deployment you need I use docker (or podman) in both types, which makes migration between the two variants relatively easy. So this is largely a system design question for me. For instance, I decided to use the Heavyweight variant (single service on a dedicated VM) for the following: Nextcloud Gitlab Home Assistant Mailcowdockerized These are all mission critical services in my Homelab. I wanted them to be maximally isolated. They also require a lot of special Firewall rules, which are easy to configure on the IP-level of a single VM. Also, Nextcloud has r/w access to a large part of my underlying ZFS storage layer. If another service in that VM became compromised, that access privilege would offer a pretty large attack surface that I wanted to avoid.
The second variant, Lightweight, I use for smaller and dedicated feature-services: Immich: Only needs read-access to the portion of my Nextcloud user's InstantUpload/Camera folders Funkwhale: Only needs read-access to the portion of my Nextcloud user's music folders Grafana/Invidious/Miniflux etc.: These don't need any access to my tank_hdd, they are very small and wasting VM overhead for each of them would be a pitty How many VMs of each type you create is up to you. I have several VLANS that I use to segment my user network access. There's a private development VLAN (Gitlab, MQTT, Grafana). Then there's a "guest" services, akin to a demilitarized zone, for internal services like Funkwhale, Invidious, Miniflux (etc.) that need to be accessed by a large portion of my private network users. Having these services organized in separate VLANs simply makes it easier to manage firewall rules. The key is not how to assign services, but to follow the same organization pattern on each level of the hierarchy. On each VM, by its way of organization, you will know immediatly where to find the data and how to deal with service updates/backups. Being consistent is the main benefit that makes such a hierarchical system easy to manage. Creating a qcow2 VM template# To utilize disposable VMs, we must first build a Cloud-Init template. In homelabs and small enterprises, there is a habit of treating servers as "Pets". Lovingly hand-crafted and manually patched systems that we tend to individually. A modern architecture considers compute as Cattle. Numbered, identical, and replaced when sick. If you are still the "Pet"-type, I strongly advise to switch to the Cattle-approach, for your peace of mind. By separating the state (the persistent data on ZFS) from the compute (the ephemeral "Cattle" running the OS), we eliminate configuration drift. Rather than clicking through a Debian ISO installer and maintaining a server for years, we download an official pre-built qcow2 cloud image. However, instead of doing this on the hypervisor, we use a disposable tooling VM. This adheres to the "Infrastructure as Markdown" 3 philosophy.