Hacker News new | past | comments | ask | show | jobs | submit login
The Ken Thompson Hack (c2.com)
127 points by stephc_int13 on March 30, 2024 | hide | past | favorite | 43 comments



The team at bootstrappable.org have been working very hard at creating compilers that can bootstrap from scratch to prevent this kind of attack (the "trusting trust" attack is another name for it.) They've gotten to the point where they can bootstrap in freestanding so they don't need to trust any OS binaries anymore (see builder-hex0.)

I've spent a lot of my spare time the past year or so working on my own attempt at a portable bootstrappable compiler. It's partly to prevent this attack, and also partly so that future archaeologists can easily bootstrap C even if their computer architectures can't run any binaries from the present day.

https://github.com/ludocode/onramp

It's nowhere near done but I'm starting a new job soon so I felt like I needed to publish what I have. It does at least bootstrap from handwritten x86_64 machine code up to a compiler for most of C89, and I'm working on the final stage that will hopefully be able to compile TinyCC and other similar C compilers soon.


So bootstrap in freestanding does make this kind of attack much more difficult to pull off, but with contemporary hardware, it does not fully prevent the attack.

What if the trojan is in microcode? No amount of bootstrap in freestanding can protect you here.


It is true that there are many layers of code below the OS level. UEFI for example is probably hundreds of thousands of lines of compiled code. Modern processors have Intel IME and equivalent with their own secret firmware. Almost all modern peripherals will have microcontrollers with their own compiled code.

These are all genuine attack vectors but they are not really solvable from the software side. At least for Onramp I consider these problems to be out of scope. It may be possible to solve these with open hardware but a solution will look very different from the kind of software bootstrapping we're doing.


Boot from obfuscated VM running on a FPGA softcore?

Maybe on two completely different ones and verify for differences.


Correct me if I’m wrong, but isn’t this recreating a thing that used to exist? I have memories of being told of a compiler older than GCC that could compile itself using… I want to say a bash script. It took forever to run because you had to run the script which of course was slow, and then it output a completely unoptimized compiler. And if memory serves that output didn’t have any of the optimization logic in it. So you had to compile it again to get the optimizer passes to be compiled in, then compile it again to get a fast compiler (self optimization).


Impressive work and truly necessary! Thanks for sharing.


really nice, and impressive work. However, I'm left wondering if a route via Forth would not have been a lot shorter.


Surprisingly, it took until last year for someone to actually ask Ken for the code: https://research.swtch.com/nih


Thanks for the link. I had saved it for later reading, but didn't get around to doing it until now.

It's such a well-written article.

Here's previous HN discussion about it:

https://news.ycombinator.com/item?id=38020792


Thanks! Macroexpanded:

Running the "Reflections on Trusting Trust" Compiler - https://news.ycombinator.com/item?id=38020792 - Oct 2023 (67 comments)

(posted by rsc himself!)


>"infinite spinner"

>"This site uses features not available in older browsers."

>Enable Javascript.

Black plaintext on white background with hyperlinks. Revolutionary.



For more, including information about the "Diverse Double-Compiling" (DDC) countermeasure, see my page: https://dwheeler.com/trusting-trust/


Question from reading the abstract: how does one acquire this “second (trusted) compiler” when you are genuinely suspicious you are facing an adversary using an attack like this one?


The second compiler can be subverted, as long as it's not subverted the same way.

In particular, it could be one you wrote yourself. It doesn't need to support everything, just enough to compile the compiler under test.

It could also be a compiler from a different group/organization.

You could use many compilers and test with all them.


From the ACM paper [0]:

> Acknowledgment. I first read of the possibility of such a Trojan horse in an Air Force critique [4] of the security of an early implementation of Multics. I cannot find a more specific reference to this document. I would appreciate it if anyone who can supply this reference would let me know.

ofc the US Gov was behind this. Incredible.

[0]: https://dl.acm.org/doi/pdf/10.1145/358198.358210


I'm not sure it is fair to claim that the Gov is behind the white paper, in that paying for a study is different than a program to develop malicious code.


There is something nightmarish about this kind of exploit, and that is maybe why we've been collectively in denial for such a long time.

How many supply-chain generated backdoors in the wild today?


One of my favorite short stories plays on the natural horror of this realization: https://www.teamten.com/lawrence/writings/coding-machines/


Thank you! I was immediately reminded of that story when I read the article, but would never have been able to find it!


Supply-chain backdoors are one thing, but the saving grace about the backdoors injected by trusting-trust attacks like in the OP is that they're quite fragile. If the shape of the expected target program changes sufficiently, then the backdoor will fail to propagate. And propagating the backdoor into programs that have yet to be written requires an effectively omnisicient adversary. As long as you have multiple implementations of a toolchain, you can repeatedly use them to compile each other in every possible configuration (and then use the outputs of those compilation steps as inputs to the same process, repeatedly) and then compare that the outputs are bit-for-bit identical until you reach a point where you're assured that either none of the compilers are backdoored or all of them would have to be.

So yes, they're scary, but there are countermeasures.


I mean, this class of attack is defeated as a matter of course in high reliability applications like aerospace. You just check the compiler output precisely corresponds to the compiler input. You need to do that anyways to prevent miscompilation errors in general (this class of attack is just intentional miscompilation), so it hardly counts as a nightmarish problem, just a annoying one.


> You just check the compiler output precisely corresponds to the compiler input.

just is doing a lot of work there.


What do you want me to say? That is the work that you must do and that is routinely done. It is annoying, but it is not hard or special. Hardly a nightmarish problem if it gets solved routinely at scale across a entire industry.


How is this normally done? Manual review of machine code?


I do not work on verification directly, so I do not know the details. I believe at a high level you generally have a automated first pass which generates a source<->machine trace. You then have multiple independent reviews validating the correctness and exhaustiveness of the trace.

Generally in these environments you build with optimizations off unless needed to ease this validation process as, obviously, you must validate the released/deployed version.



Open source hardware including silicon, board designs, and firmware coupled with deep supply chain surveillance are needed to mitigate this threat. There is no other way other than destructive and NDT of samples with verification. Also, an entire board of chips needs to be superficially verifiable over JTAG or some bus. There's nowhere to hide if a chip can be Xray matched to its design and to its RTL model.


Clicked on the first Bell Labs link hoping to read the original and ended up on some VPN service's landing page. Sigh. :(



I preferred when link rot led to a 404 or site not found. I have recently been working with some stuff I wrote ~2004 and so many links are now dead. I've been giving archive.org a real workout.


It looks like the xz hack riffs on this idea in getting the payload into the final binary, and attempting to hide itself.


I think UNIX will be secure; next generation of AI will be able to scan directly the image of a compiled kernel and will be able to detect -- if not all -- harmful things.


A small side note: the website look like a classic web website, BUT if you run a privacy attentive WebVM [1] you get at first

[1] the monsters normally known as "browsers", witch happen to be something like `less`, `more`, `most` and so on, not much like a JVM "javascript required to view this site" while perfectly classic looking and modern looking websites works very well without js, if you enable js but keep third party contents restricted you get a "page does not exist"...

A critics just to say "please, if your audience is some tech savvy cohort, try to design things for them.


What's the potential percentage of malware on 100GB (100,000,000,000 bytes) in your system if a backdoor occupies 100 bytes?


100GB of what?

Text documents? Word documents? CSV files? Excel files? Cat videos? Porn videos? Source code from a trusted source? Source code from randos? Games or apps installed from a marketplace? Games or apps installed as downloads directly from random websites?

It's like asking what's the chance you'll die from a fall if you live 80 years. The chance is hard to know, but it's probably more likely if you're a free solo rock climber than if you aren't.


Sure modern Mac/Windows/Linux system files with browser cache. Code+data files after build. This is enough to never find out about those 100 bytes.

Not to mention hardware backdoors in chips. Have all trillions of transistors and their schematics in each chip version been checked for possible backdoors in the Verilog compiler? One extra connection can provide root access and cost millions of dollars.


Not all data on your PC are equally dangerous. The key is to keep Trusted Computing Base as small as possible. See: Qubes OS.


QubesOS is amazing! I've been working on QubesOS for several years. When it was just starting out, it was a fairly simple solution, but now it's a big software, and it has just as many problems as any Linux distribution. Complexity is the only parameter by which you can determine the amount of malware. For 0 bytes of useful code, there will be 0 backdoors. I also developed ACPU OS for this reason 12 years ago, but later realized that no one would pay for simplicity.


> but now it's a big software, and it has just as many problems as any Linux distribution

Could you elaborate? The TCB did not increase since almost the beginning of Qubes. Moreover, the hardware-assisted virtualization by default improved the security a lot. Also, Qubes is not a Linux distribution: https://www.qubes-os.org/faq/#is-qubes-just-another-linux-di...


I actually thought his Turing award was for this attack. Turns out it was "just" his acceptance speech!


GNU Mes tries to avoid this by bootstrapping everything.


Yes, and it's not an academic exercise: it's being used in Guix (together with related projects such as stage0, M2-Planet, and Gash) to build everything from source starting from a 357-byte program:

https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...

This gives the level of auditability and transparency that we desperately need.

However, the same problem of "yogurt software" manifests at higher levels of the stack, typically with compilers and build systems, some of which still cannot be built from source (GHC and Bazel come to mind).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: