Pangram verdict · v3.3
We believe that this document is primarily human-written, with some AI-generated content detected
AI likelihood · overall
MixedArticle text · 1,889 words · 6 segments analyzed
I am Philip—an engineer working at Distr, which helps software and AI companies distribute their applications to self-managed environments. Our Open Source Software Distribution platform is available on GitHub (github.com/distr-sh/distr) and orchestrates both Docker Compose and Docker Swarm deployments on customer hosts every day.Most of the production incidents I have seen on Docker Compose hosts come from the same handful of quirks: an old container that should have been removed, a disk that filled up overnight, a health check that detected a problem and then did nothing about it, a :latest tag that pointed somewhere new, or a socket mount nobody thought twice about. None of these are bugs in Docker. They are deliberate trade-offs in a tool that started as internal tooling at dotCloud, a PaaS company that wrapped LXC to fix “it works on my machine,” and is now running the back end of a lot of real businesses. This post collects the recurring ones, with the commands and the operational answer for each.Short answer: yes—plain Docker Compose can still run real production workloads in 2026, but only if you handle the operational gaps it leaves yourself.Where Plain Docker Compose Fits in ProductionBefore the list of quirks, a quick word on the audience. Docker Compose is a declarative way to wire up a multi-container application: one YAML file describes the services, the networks between them, the volumes they share, the environment they need, and—through the patterns for overwriting or patching service configuration—the on-disk configuration each application expects. docker compose up reconciles the host to that file. The sweet spot in production is the single-node deployment built around exactly that—a vendor pushing a multi-container application into a customer environment, an internal team running a long-tail service that does not justify a Kubernetes cluster, an edge box in a retail location. The footprint is small, the operational overhead is low, and a competent operator can reason about the whole stack from one docker-compose.yaml. There is no control plane behind Compose itself—no scheduler watching the host, no reconciler reapplying state, no operator pushing updates from somewhere else. docker compose up runs once and exits.That architectural simplicity is exactly why the quirks bite.
Compose assumes you—or whoever runs the host—will do the operational work nothing else is doing, and if you ship Compose files to customers the safe assumption is that the customer will not. The rest of this post is about closing the gap between what Compose does and what a production host actually needs, either by hand or with an agent that does it for you. If you have already concluded that the gap is too wide and want to compare with the next step up, read our Docker Compose vs Kubernetes breakdown.Docker Compose Orphan Containers and --remove-orphansRemove a service from docker-compose.yaml, run docker compose up -d, and the container you removed keeps running. It is detached from the project but still bound to the same networks and ports. docker compose ps will not show it, because Compose only lists what is in the current file. docker ps --filter label=com.docker.compose.project=<name> will, because Docker still has the label on the container. This is how you discover, six months in, that an old worker service has been quietly consuming RAM since the last refactor.The fix is one flag:docker compose up -d --remove-orphansdocker compose down --remove-orphansThe flag tells Compose: any container that was once part of this project but is no longer in the file should be removed. Networks Compose created for the project are reconciled the same way on each up, so orphan networks go away too. Volumes are the exception—Compose preserves named volumes by default to protect data, and there is no per-service flag to drop the ones a removed service used. To reclaim that space you have to do it manually: list candidates with docker volume ls --filter dangling=true and docker volume rm by name, or use docker compose down -v if you intend to wipe the project’s volumes wholesale. To audit before deleting, list everything Docker still associates with the project name:docker ps -a --filter label=com.docker.compose.project=<name>Distr’s Docker agent passes RemoveOrphans: true on every Compose Up call, so customer hosts never accumulate orphans across deployment updates. That single flag has eliminated a recurring class of “the old version is still answering on port 8080” support tickets.Pruning Docker Images and Capping Container LogsEvery docker compose pull keeps the previous image on disk.
Every container with the default json-file log driver writes unbounded JSON to /var/lib/docker/containers/<id>/<id>-json.log. On a busy host this is one of the most common reasons for an outage: the disk fills and Docker stops being able to write anything—logs, metadata, image layers—at which point containers start failing in confusing ways.The first thing to learn is the audit command:docker system dfdocker system df -v-v breaks the totals down per image, container, volume, and build cache, which is usually enough to spot the offender. From there, the targeted prune commands:docker image prune -a --filter "until=168h" -f # delete unused images older than 7 daysdocker container prune -f # remove stopped containersdocker builder prune -f # drop the BuildKit cachedocker volume prune -f exists too, and it is genuinely useful, but read the next aside before you run it.The other half of the disk story is logs. Cap them at the daemon level, once, in /etc/docker/daemon.json:{ "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" }}After systemctl restart docker, every new container will rotate its logs at 10 MB and keep at most three rotated files—30 MB ceiling per container, instead of “until the disk is gone.” Existing containers need to be recreated to pick up the new defaults.This is one of the topics worth getting right before you ship.In Distr’s Docker agent the cleanup is built in: each deployment target has an opt-out container image cleanup setting that removes the previous version’s images automatically after a successful update, with retries on failure.
It only fires on success, so the previous image stays on disk if something goes wrong and you need to roll back.Docker Health Checks Don’t Restart Unhealthy ContainersThis is the one that surprises people the most. You add a HEALTHCHECK to your Dockerfile or a healthcheck: block to the service in Compose, you watch the container go from healthy to unhealthy, and then… nothing happens. The Docker Engine reports the status. It does not act on it. restart: unless-stopped is triggered by the container exiting, not by it being marked unhealthy.You can confirm what Docker actually thinks:docker inspect --format='{{json .State.Health}}' <container> | jqYou will see the status, the streak of failures, and the last few probe outputs—useful information that is silently ignored by the engine.There are three answers to this: Run an autoheal sidecar. The community standard is willfarrell/docker-autoheal: a tiny container that mounts the Docker socket, watches for unhealthy events, and restarts the offending container. You opt containers in by labeling them autoheal=true (or set AUTOHEAL_CONTAINER_LABEL=all to monitor everything). Run on Docker Swarm. Swarm restarts unhealthy tasks by default. If you are already considering Swarm, this is one of the better reasons. Use Distr. Every Distr Docker agent deploys an adapted autoheal service alongside it. The “Enable autoheal for all containers” toggle is on by default at deployment-target creation, so customer-side restarts of unhealthy containers happen without anyone configuring it.
Whichever path you pick, the takeaway is the same: a HEALTHCHECK without something acting on it is a status light, not a self-healing system.Pinning Docker Images by Digest Instead of :latestDocker tags are mutable references. myapp:1.4 today is whatever the registry currently has under that tag; tomorrow it can point at a different layer set after a re-push. :latest is the worst offender because everyone treats it as a synonym for “stable” when in practice it often means “whatever was pushed most recently.” It is also the silent default: an unqualified image: nginx in a Compose file is treated as image: nginx:latest, so even Compose files that never type the word land on it by accident. The result, in production, is that two hosts pulling the “same” tag five minutes apart can end up running different code.The fix is to pin by content-addressable digest. Every image has one, and Docker accepts it anywhere a tag would go.To find the digest for an image you already pulled:docker image inspect --format='{{index .RepoDigests 0}}' myapp:1.4# myapp@sha256:9b7c…Or, without pulling, from the local Docker installation against the remote registry:docker buildx imagetools inspect myapp:1.4In your Compose file, replace the tag with the digest:services: app: image: myapp@sha256:9b7c0a3e1f...A pull against a digest fails fast if the registry no longer has those bytes, which is exactly what you want—silent drift becomes a loud error. The same image reference works in docker stack deploy, in docker run, and in Kubernetes manifests.For the broader picture of what your customers can extract from a published image (and why image hygiene matters beyond reproducibility), check out our guide on protecting source code and IP in Docker and Kubernetes deployments. And if you’re still picking a registry, our container registry comparison walks through the trade-offs.Why Mounting /var/run/docker.sock Is a Security RiskA container with /var/run/docker.sock mounted can call the Docker API, and the Docker API can launch a privileged container that mounts the host’s root filesystem.
In other words: any container with the socket has effectively root privileges on the host. This is not a Docker bug; it is the threat model of the socket. It deserves a moment of attention because the line that grants this access is one bind mount in a Compose file and is easy to add without thinking about it.Practical hygiene: Inventory the containers that mount the socket. Agents, CI runners, monitoring sidecars, container management UIs—keep the list short and intentional. Run rootless Docker where possible. dockerd-rootless-setuptool.sh install sets up a Docker daemon that runs as a regular user. The blast radius of a compromised socket-mounting container shrinks from “full host” to “this user account.” Consider socket-proxy. Projects like Tecnativa’s docker-socket-proxy expose a filtered subset of the API to the container that needs it (e.g. read-only containers and events for monitoring) instead of the full socket. Keep socket-mounting images minimal. Smaller surface, fewer libraries, fewer ways in. The Distr Docker agent does mount the socket—it has to, in order to orchestrate Compose and Swarm on the host. We document that boundary openly in the Docker agent docs so customer security teams can review it before installation. The agent authenticates to the Hub with a JWT, and the install secret is shown once and never stored.Updating Docker Compose Deployments Across Customer Hostsdocker compose pull && docker compose up -d is a fine command if you are SSH’d into the host. At customer scale—dozens of self-managed environments behind firewalls, each with its own change-control process—that manual process doesn’t scale. Docker has no built-in mechanism to push a new manifest to a running host from somewhere else. Docker Hub webhooks can trigger a CI rebuild when an image is pushed, but they do not reach into a customer’s network and tell their docker compose to pull.The usual workarounds and what they cost: Watchtower: Polls the registry on a schedule, pulls new images, recreates containers. Easy to set up, hard to control. No staged rollout, no rollback path, limited visibility from your side—you find out a customer updated when they file a ticket. Bastion + SSH + Ansible/scripts: Works for ten customers.