I want to be able to run code from untrusted sources (other people, users of my SaaS application, LLMs) in an environment, where I can control the blast radius if something goes wrong.
Hey Simon, given it's you ... are you concerned about LLMs attempting to escape from within the confines of a Docker container or is this more about mitigating things like supply chain attacks?
I'm concerned about prompt injection attacks telling the LLM how to escape the Docker container.
You can almost think of a prompt injection attack as a supply chain attack - but regular supply chain attacks are a concern too, what if an LLM installs a new version of an NPM package that turns out to have been deliberately infected with malware that can escape a container?
When you use docker you can have full control over the networking layer already.
As you can bound it's networking to another container that will act as proxy/filter. How WASM offer that?
With reverse proxy you can log requests, or filter them if needed, restrict the allowed domains, do packet inspection if you want to go crazy mode.
And if an actor is able to tailor fit a prompt to escape docker, I think you have bigger issues in your supply chain.
I feel this wasm is bad solution. What it brings a VM or docker can't do?
And escaping a docker container is not that simple, require a lot of heavy lifting and not always possible.
Aside from my worries about container escape, my main problem with Docker is the overhead of setting it up.
I want to build software that regular users can install on their own machines. Telling them they have to install Docker first is a huge piece of friction that I would rather avoid!
The lack of network support for WASM fits my needs very well. I don't want users running untrusted code which participates in DDoS attacks, for example.
You have the same lack of network support with cgroups containers if you configure them properly. It isn't as if it's connected and filtered out, but as though it's disconnected. You can have it configured in such a way that it has network support but that it's filtered out with iptables, but that does seem more dangerous, though in practice that isn't where the escapes are coming from. A network namespace can be left empty, without network interfaces, and a process made to use the empty namespace. That way there isn't any traffic flowing from an interface to be checked against iptables rules.
I think that threat is generally overblown in these discussions. Yes, container escape is less difficult than VM escape, but it still requires major kernel 0day to do; it is by no means easy to accomplish. Doubly so if you have some decent hygiene and don't run anything as root or anything else dumb.
When was the last time we have heard container escape actually happening?
Just because you haven't heard of it doesn't mean the risk isn't real.
It's probably better to make some kind of risk assessment and decide whether you're willing to accept this risk for your users / business. And what you can do to mitigate this risk. The truth is the risk is always there and gets smaller as you add several isolation mechanisms to make it insignificant.
I think you meant “container escape is not as difficult as VM escape.”
A malicious workload doesn’t need to be root inside the container, the attack surface is the shared linux kernel.
Not allowing root in a container might mitigate a container getting root access outside of a namespace. But if an escape succeeds the attacker could leverage yet another privilege escalation mechanism to go from non-root to root
Better not rely on unprivileged containers to save you. The problem is:
Breaking out of a VM requires a hypervisor vulnerability, which are rare.
Breaking out of a shared-kernel container requires a kernel syscall vulnerability, which are common. The syscall attack surface is huge, and much of it is exploitable even by unprivileged processes.
They both can be highly unescapable. The podman community is smaller but it's more focused on solving technical problems than docker is at this point, which is trying to increase subscription revenue. I have gotten a configuration for running something in isolation that I'm happy with in podman, and while I think I could do exactly the same thing in Docker, it seems simpler in podman to me.
Apologies for repeating myself all over this part of the thread, but the vulnerabilities here are something that Podman and Docker can't really do anything about as long as they're sharing a kernel between containers.
If you're going to make containers hard to escape, you have to host them under a hypervisor that keeps them apart. Firecracker was invented for this. If Docker could be made unescapable on its own, AWS wouldn't need to run their container workloads under Firecracker.
This same, not especially informative content is being linked to again and again in this thread. If container escapes are so common, why has nobody linked to any of them rather than a comment saying "There are lots" from 3 years ago?
Perspective is everything, I guess. You look at that three year old comment and think it's not particularly informative. I look at that comment and see an experienced infosec pro at Fly.io, who runs billions of container workloads and doesn't trust the cgroups+namespaces security boundary enough so goes to the trouble of running Firecracker instead. (There are other reasons they landed there, but the security angle's part of it.)
Anyway if you want some links, here are a few. If you want more, I'm sure you can find 'em.
Some are covered off by good container deployment hygiene and reducing privilege, but from my POV it looks like the container devs are plugging their fingers in a barrel that keeps springing new leaks.
(To be fair, modern Docker's a lot better than it used to be. If you run your container unprivileged and don't give it extra capabilities and don't change syscall filters or MAC policies, you've closed off quite a bit of the attack surface, though far from all of it.)
But keep in mind that shared-kernel containers are only as secure as the kernel, and today's secure kernel syscall can turn insecure tomorrow as the kernel evolves. There are other solutions to that (look into gVisor and ask yourself why Google went to the trouble to make it -- and the answer is not "because Docker's security mechanisms are good enough"), but if you want peace of mind I believe it's better to sidestep the whole issue by using a hypervisor that's smaller and much more auditable than a whole Linux kernel shared across many containers.
I mean docker runs in sudo privileges for the most part, yes I know that docker can run rootless too but podman does it out of the box.
So if your docker container gets vulnerable and it can somehow break through a container, I think that with default sudo docker, you might get sudo privileges whereas in default podman, you would be having it as a user run executable and might need another zero day or smth to have sudo privilege y'know?