Simple measures of complexity lead me to think otherwise. For example, the size ...

mentat · on Dec 30, 2019

That's when you could do layout by hand. I thought you meant a modern CPU which is what that UI runs on. Good luck using Facebook on a 6502 which really brings up the rest of the subsystems that collaborate with the CPU.

kragen · on Dec 31, 2019

Yes, that's when you had to do layout and design rule checks by hand, and you didn't have SPICE, so you couldn't run a simulation --- you had to breadboard your circuits if you weren't sure of them. Before that, you had to do the gate-level design because you didn't even have Verilog, much less modern high-level HDLs like Chisel, Migen, and Spinal. Before that, you had to desk-check your RTL design, because mainframe time was too expensive to waste finding obvious bugs in designs that hadn't been desk-checked. That's why the 6502 took eight talented people ten months. Nowadays you don't need to do any of that stuff, so it's much easier now to design a CPU than it was in 1974.

It's true that you need a faster CPU than a 6502 to run Facebook, but that's a matter of transistor count and clock speed much more than logical complexity. To a great extent, in fact, since both transistor count and clever design can improve performance, you can trade off transistor count against logical complexity if you're holding performance constant. (As a simple example, a 64-bit processor can be the same logical complexity as a 16-bit processor --- you can even make the bit width a parameter in your Chisel source code. An 8-bit processor needs to be more complex because an 8-bit address space is not practical.) Such a tradeoff is not an option for Intel, who need the best possible price-performance tradeoff in the market, which involves pushing hard on both transistor count and logical design.

Even if we take Intel's current designs as a reference, it's absurd to suggest that they're even equally complex as Facebook's user interface, let alone multiple orders of magnitude more complex. Do you literally think that Intel has hundreds of thousands of employees working on CPU design? They don't even have multiple hundreds of thousands of employees total. Do you literally think that the "source code" for a 64-core, 64-bit 30-billion-transistor CPU like the Amazon Graviton2 --- thus less than half a billion transistors per core --- is multiple gigabytes? Like, several bytes per transistor?

Let's look at a real CPU design it's plausible to run Fecebutt's UI on. https://github.com/SpinalHDL/VexRiscv is an LGPL-licensed RISC-V core written in SpinalHDL, an embedded DSL in Scala for hardware design. The CPU implementation is a bit less than 10,000 lines of Scala, but only about 2500 of that is the core part, the rest being in the "demo" and "plugin" subdirectories. There's another few thousand lines of C++ for tests. (There's also 40,000 lines of objdump output for tests, but presumably that's mostly disassembled compiler output.) You can run Linux on it, and you can run it on a variety of FPGAs; one Linux-capable Artix 7 configuration runs 1.21 DMIPS/MHz at 170 MHz.

This is not terribly atypical; the Shakti C-class processor from IIT-Madras at https://gitlab.com/shaktiproject/cores/c-class (1.72 DMIPS/MHz) is 33,000 lines of Bluespec, according to

    find c-class/ -name '*.bsv' -print0 | xargs -0 cat |
        sed 's,//.*,,' | grep -P \\S  | wc

Shakti or VexRiscv are about two orders of magnitude more complexity than a simple CPU design like the J1A or Wirth's RISC, but Shakti and VexRiscv are full-featured RISC-V CPUs with reasonable performance, MMUs, cache hierarchies, and multicore cache-coherency protocols, that can run operating systems like Linux.

In summary, a simple CPU is about a hundred lines of code and is reasonable for one person to write in a day or a few days. A modern RISC-V CPU with all the bells and whistles is about ten thousand lines of code and is reasonable for half a dozen people to write in a year. Facebook's UI is presumably a few million lines of code and has taken around a thousand talented people over a decade to build. Intel's and AMD's CPUs presumably represent around the same order of magnitude of effort, but much of that is the verification needed to avoid a repeat of the Pentium FDIV bug, which both doesn't add to the complexity of the CPU, and also isn't necessary either for Facebook's UI or for a core you're running on an FPGA.

Ergo, a full-featured modern CPU is about two or three orders of magnitude less complexity than Facebook's UI, and a CPU optimized for simplicity is about two or three orders of magnitude less complexity than that.

krupan · on Dec 31, 2019

Aren't you ignoring a whole host of physical design complexities? Power, clock speed, signal integrity, packaging, manufacturability and yield? Yes, implementing the design in an FPGA solves some of those, but not all.

I guess your overall point is that it could be possible to provide people with source code, have them push one button, and get a working bitstream out (just the same as we simple browse to facebook.com and get a working app). That assumes that the designers know the target FPGA and work extra hard to make sure that their design meets timing/power/etc. budgets with any randomized placement and routing for that FPGA. Hmm, yeah, I guess that probably still is easier than creating Facebook's UI, as long as we can assume some constraints.

kragen · on Jan 1, 2020

> it could be possible to provide people with source code, have them push one button, and get a working bitstream out (just the same as we simple browse to facebook.com and get a working app).

Right.

> packaging, manufacturability and yield

Using an FPGA solves those problems.

> signal integrity,

When we're talking about digital computing device design, rather than test instrument design or VHF+ RF design, there's a tradeoff curve between how much performance you get and how much risk you're taking on things like signal integrity, and, consequently, how much effort you need to devote to them.

> know the target FPGA

> timing/power/etc. budgets

> Power, clock speed

Similarly, those are optimizations. Facebook actually has a lot of trouble with power and speed, I think because they don't give a flip --- they aren't the ones who have to buy the new phones. They have trouble delivering messaging functionality on my 1.6GHz Atom that worked fine on a 12MHz 286 with 640K of RAM, so they have something like three orders of magnitude of deoptimization. (The 286 took a couple of seconds to decode a 640x350 GIF, as I recall, and Facebook is quite a bit faster than that at decoding and compositing JPEGs --- because that code is written in C and comes from libjpeg.)