LLMs seem like the least efficient way to accomplish this. NAND gates, for examp...

LLMs seem like the least efficient way to accomplish this. NAND gates, for example, are inherently 1-bit operators, but LLMs use more. If weights are all binary, than gradients are restricted to -1, 0, and 1, which doesn't give you much room to make incremental improvements. You can add extra bits back, but that's pure overhead. But all this is besides the real issue, which is that LLMs and NNs in general are inherently fuzzy; they guess. Computers aren't, we have perfect simulators.

Consider how humans design things. We don't talk through every CPU cycle to convince ourself a design works, we use bespoke tooling. Not all problems are language shaped.