Citation needed. I/we ran illumos zones for multitenant untrusted workloads for ...

tptacek · on Aug 3, 2022

I didn't give a statistic in this quote, so I won't cite one. Illumos Zones are shared-kernel isolation: all the tenants share a kernel attack surface. That's not sufficient for untrusted cotenants; any kernel LPE compromises the whole scheme. The kernel's attack surface is drastically larger than (say) the KVM attack surface.

bcantrill · on Aug 3, 2022

Well, it's not just the KVM attack surface -- it's KVM + QEMU, and there have emphatically been escapes. Yes, the shared kernel is an issue -- but so is a shared hypervisor or a shared CPU. It's a risk to be mitigated, and it's a gross exaggeration to say that it "isn't safe for multitenant untrusted workloads."

tptacek · on Aug 4, 2022

It's QEMU for you, or was at the time, but I'm not talking about what the right engineering decision is in 2015, I'm talking about what makes sense today, with memory-safe hypervisors.

For untrusted multitenant workloads in 2022, for arbitrary code and without a language-level sandbox, a shared-kernel workload isolation system might be malpractice. Again: you can easily rattle off the LPEs that would have broken a Linux shared-kernel scheme (of any realistic sort) over the last couple years.

Someone else dunked on you for a Joyent bug from a bunch of years ago. I didn't. Security researchers were dunking on shared-kernel isolation even back then, but it was a much harder decision back when the only alternative was expensive, memory-unsafe legacy hypervisors, and I would have had a hard time weighing guest escape vs kernel LPE back then too.

But this time and the last time we bounced off each other on this, we weren't talking about 2015; we're talking about today, when there are multiple memory-safe hypervisor options. I don't think it's an open question anymore.

bcantrill · on Aug 4, 2022

I don't think we're really comparing hardware virtualization to OS-based virtualization (we at Oxide run HW-based virt inside of OS-based virt so that's a bit of a false dichotomy to begin with), but rather whether or not it's "malpractice" (your term) to run a multitenant workload on illumos zones. And to be clear: we're not talking about Linux here; we're talking about illumos -- for which OS-based virtualization has an entirely different design center. (And indeed, a design center that was -- from the beginning -- designed for securing multitenant workloads.) So your experiences with Linux are of limited relevance here, frankly.

tptacek · on Aug 4, 2022

I'm happy to leave it here. I stand by the assessment I offered upthread. Cheers!

bcantrill · on Aug 4, 2022

To you as well!

psanford · on Aug 3, 2022

ZDI-16-464

bcantrill · on Aug 3, 2022

Yes, very familiar with that one! Not only is this one of the very few zone escapes over our years in production (responsibly disclosed, thankfully!), but the bug itself was introduced by yours truly -- and is part of what gave me religion on Rust. To be clear, my assertion was not that any particular body of software is invulnerable, but rather taking issue with the assertion that zones-based infrastructure "isn't safe for multitenant untrusted workloads"; we ran exactly that for a decade. I also very much stand by my assertion that Meltdown was a greater source of vulnerability than zones -- and if one wishes to assert that a shared kernel makes zones unsafe, than one also must say that a shared microprocessor is unsafe. For some folks, that will be a completely reasonable assertion, but for most, they will understand that a shared microprocessor -- past vulnerabilities aside -- can in fact be made safe for multitenant use.

tptacek · on Aug 4, 2022

I don't think you should have to apologize for a 2016 Joyent bug.

I also don't understand how you can coherently argue that people should have "religion about Rust", but also put their faith in C-language OS kernels any more than they have to.

Further, I don't understand how Meltdown helps your argument at all here, since both isolation strategies are susceptible. Memory safety also doesn't protect you from control-plane SSRF vulnerabilities, but you can immediately see why "control-plane SSRF vulnerabilities mean memory-safety is overrated" is a bogus argument.

bcantrill · on Aug 4, 2022

I'm not arguing that people should have religion about Rust -- merely explaining that this particular issue was central to my own internalization of some of the very subtle unsafety in C. In terms of Meltdown: I am saying that -- as a practical matter -- it was much more serious for us than the extraordinarily small number of vulnerabilities we have had in zones over the years.

tptacek · on Aug 4, 2022

People should have religion about minimizing the amount of memory-unsafe code in their attack surface.