In theory it is a security boundary. But the attack surface is so big, and local...

sneak · on Feb 9, 2021

You can say the same about VMs, to some extent.

Containers absolutely are intended to be a security boundary.

jedberg · on Feb 9, 2021

They most certainly are not. That's a common misbelief. Containers are not designed as a security boundary, they just happen to function as one most of the time.

VMs on the other hand actually are designed as a security boundary, but even then there are still attacks you can do against other VMs on the same box.

CodesInChaos · on Feb 9, 2021

I agree that both are security boundaries in theory. But a minimal hypervisor is much stronger than a cgroup container. Cgroup containers are a thin door made of wood, VMs a vault door made of steel. So people saying "containers are no security boundary" are exaggerating a bit, but not much.

A minimal VM, like firecracker has a small attack surface, so I'm willing to trust that privilege escalation/VM escapes will be rare.

A process restricted by cgroup/namespace/etc. still has access to the huge API surface exposed by the kernel, so privilege escalation is common, and I'm unwilling to trust this mechanism to isolate malicious code.

sneak · on Feb 9, 2021

I agree that they're not very good ones, but a container escape would be treated by everyone the same way a VM escape would be: instant patching, coordinated/embargoed disclosure, AWS finding out before you do, et c.

They didn't start out at the design phase that way, but they absolutely are today.

tekknik · on Feb 10, 2021

https://en.wikipedia.org/wiki/Virtual_machine_escape

CodesInChaos · on Feb 10, 2021

Of course VMs escapes exist. But many of the vulnerabilities are in functionality which aren't relevant for modern servers. Hardware virtualization support prevents many attacks. For example firecracker supports little more than network, block-storage and vsocks, which keeps the attack surface small.

cookiecaper · on Feb 9, 2021

Containers are not intended to be a security boundary -- functionality along those lines has been gradually backported as maintainers realized that nobody was going to care when they said "don't use these as a security boundary".

There's a world of difference between the amalgamation of hacks that comprise cgroups and something like BSD jails, which are and afaik always have been intended to be a security boundary, which implements real first-class kernel isolation for jailed processes, not just another subtree under proc that provides some direction to the kernel around resource consumption/priority and relies on UID/GID hacks to control access.

beermonster · on Feb 9, 2021

No they are not

https://info.aquasec.com/container-security-book

tekknik · on Feb 10, 2021

You expect people to read a book to find out your perspective? Do you have cliff notes on why using isolation mode doesn’t provide a security boundary?

beermonster · on Feb 10, 2021

Containers provide resource isolation using a shared kernel but are not intended to be used in hostile multitenancy scenarios.

A key feature of OS virtualisation is the strong segmentation boundary between

1. Guests

2. Guests and the hypervisor.

For this reason, VMs are seen to provide a stronger security boundary than containers and are used in preference where that aspect is critical owing to environment, multi-tenancy, business context.

See also https://searchcloudsecurity.techtarget.com/tip/VMs-vs-contai...

tekknik · on Feb 12, 2021

So again, what about isolation mode? I don’t know what this is called in the linux world but in windows this feature does exactly this. Still a shared kernel but a far cry from what your explaining.