Hacker News new | past | comments | ask | show | jobs | submit login

why not? I would argue, that they use linux namespacing, cgroup etc.



In theory it is a security boundary. But the attack surface is so big, and local privilege escalation bugs so common, that you should not rely on it to isolate different untrusted users.


You can say the same about VMs, to some extent.

Containers absolutely are intended to be a security boundary.


They most certainly are not. That's a common misbelief. Containers are not designed as a security boundary, they just happen to function as one most of the time.

VMs on the other hand actually are designed as a security boundary, but even then there are still attacks you can do against other VMs on the same box.


I agree that both are security boundaries in theory. But a minimal hypervisor is much stronger than a cgroup container. Cgroup containers are a thin door made of wood, VMs a vault door made of steel. So people saying "containers are no security boundary" are exaggerating a bit, but not much.

A minimal VM, like firecracker has a small attack surface, so I'm willing to trust that privilege escalation/VM escapes will be rare.

A process restricted by cgroup/namespace/etc. still has access to the huge API surface exposed by the kernel, so privilege escalation is common, and I'm unwilling to trust this mechanism to isolate malicious code.


I agree that they're not very good ones, but a container escape would be treated by everyone the same way a VM escape would be: instant patching, coordinated/embargoed disclosure, AWS finding out before you do, et c.

They didn't start out at the design phase that way, but they absolutely are today.



Of course VMs escapes exist. But many of the vulnerabilities are in functionality which aren't relevant for modern servers. Hardware virtualization support prevents many attacks. For example firecracker supports little more than network, block-storage and vsocks, which keeps the attack surface small.


Containers are not intended to be a security boundary -- functionality along those lines has been gradually backported as maintainers realized that nobody was going to care when they said "don't use these as a security boundary".

There's a world of difference between the amalgamation of hacks that comprise cgroups and something like BSD jails, which are and afaik always have been intended to be a security boundary, which implements real first-class kernel isolation for jailed processes, not just another subtree under proc that provides some direction to the kernel around resource consumption/priority and relies on UID/GID hacks to control access.



You expect people to read a book to find out your perspective? Do you have cliff notes on why using isolation mode doesn’t provide a security boundary?


Containers provide resource isolation using a shared kernel but are not intended to be used in hostile multitenancy scenarios.

A key feature of OS virtualisation is the strong segmentation boundary between

1. Guests

2. Guests and the hypervisor.

For this reason, VMs are seen to provide a stronger security boundary than containers and are used in preference where that aspect is critical owing to environment, multi-tenancy, business context.

See also https://searchcloudsecurity.techtarget.com/tip/VMs-vs-contai...


So again, what about isolation mode? I don’t know what this is called in the linux world but in windows this feature does exactly this. Still a shared kernel but a far cry from what your explaining.


It's not good enough for multi tenant setups. A single malicious customer can potentially steal data from other customers. The docker team also considers security to be a pretty low priority.


To say nothing of sidechannel attacks[0]

People need to stop looking at containers as a cheap way to get security. They might be a more convenient way to get lots of apps running on a single machine, but they're not very secure.

https://ieeexplore.ieee.org/document/7847002


1. I expect people to move towards a VM per pod model, even in private setups. Firecracker claims a memory overhead of 5 MB, and a minimal QEMU setup shouldn't be too bad either.

2. It sounds like this paper is mainly about covert channels not side channels. Covert channels assume cooperation between both sides, so they're only relevant if one of the sides can't communicate trivially (e.g. via network)


> vm per pod .. firecracker

agreed. AWS gets a lot of flak, but open sourcing firecracker was really great. I'd really prefer to see us move toward vms instead of containers, even if we kept the same k8s abstractions.

> .. covert ..

thanks for the catch, should have taken more time. Here's a better paper:

https://hal.inria.fr/hal-01591808/document


> I'd really prefer to see us move toward vms instead of containers, even if we kept the same k8s abstractions

1. For me containers are one of those abstractions, defined by exposing an application controlled userspace. Containers can be implemented by different isolation technologies, from simple chroot/cgroup/namespaces... to VMs.

2. I'd still use chroot&co to partially isolate containers within a pod, while using VMs to strongly isolate pods from each other. This enables features like shared block-devices, unix-domain-sockets and monitoring the processes in an application container from a separate diagnostics container.


I think it's easier to say that namespacing is nearly orthogonal to security. Native containers (i.e. containers not running in a VM) are literally just processes running on the host and need to be secured with the same methods you would use on non-namespaced processes. Namespacing does add another layer when used properly but it doesn't replace any of the existing ones.


I agree that containers are something of a security boundary, as is chroot(). Just not as robust a boundary as an actual VM.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: