> I wonder if it will spur nvidia to work on an inference only accelerator. Argu...

CharlesW · 2025-10-03T15:37:49 1759505869

> Arguably that's a GPU?

Yes, and to @quadrature's point, NVIDIA is creating GPUs explicitly focused on inference, like the Rubin CPX: https://www.tomshardware.com/pc-components/gpus/nvidias-new-...

"…the company announced its approach to solving that problem with its Rubin CPX— Content Phase aXcelerator — that will sit next to Rubin GPUs and Vera CPUs to accelerate specific workloads."

edude03 · 2025-10-03T15:50:27 1759506627

Yeah, I'm probably splitting hairs here but as far as I understand (and honestly maybe I don't understand) - Rubin CPX is "just" a normal GPU with GDDR instead of HBM.

In fact - I'd say we're looking at this backwards - GPUs used to be the thing that did math fast and put the result into a buffer where something else could draw it to a screen. Now a "GPU" is still a thing that does math fast, but now sometimes, you don't include the hardware to put the pixels on a screen.

So maybe - CPX is "just" a GPU but with more generic naming that aligns with its use cases.

bonestamp2 · 2025-10-03T16:22:57 1759508577

There are some inference chips that are fundamentally different from GPUs. For example, one of the guys who designed Google's original TPU left and started a company (with some other engineers) called groq ai (not to be confused with grok ai). They make a chip that is quite different from a GPU and provides several advantages for inference over traditional GPUs:

https://www.cdotrends.com/story/3823/groq-ai-chip-delivers-b...

imtringued · 2025-10-03T16:58:33 1759510713

The AMD NPU has more than 2x the performance per watt versus basically any Nvidia GPU. Nvidia isn't leading because they are power efficient.

And no, the NPU isn't a GPU.

edude03 · 2025-10-04T18:00:43 1759600843

Maybe a better way to make my point - the GPU is nvidias golden goose egg and it's good enough that they may go down with the ship. For example (illustrative numbers) - if it costs nvidia $100 to make a GPU they can sell to gamers for $2000, researchers for $5000 and enterprise for $15,000, would it make sense for them to start from scratch and invest billions to make something that's today an unknown amount better and that would only be interesting to the $15,000 market they've already cornered? (Yes, I'm assuming there are more gamers than people who want to run a local LLM)

AzN1337c0d3r · 2025-10-03T16:05:28 1759507528

I would submit Google's TPUs are not GPUs.

Similarly, Tenstorrent seems to be building something that you could consider "better", at least insofar that the goal is to be open.

nsteel · 2025-10-03T20:16:36 1759522596

Isn't Etched's Soho ASIC claimed to be much better than a GPU?

https://www.etched.com/announcing-etched

quadrature · 2025-10-03T16:08:25 1759507705

I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.

conradev · 2025-10-03T16:56:47 1759510607

They’re already optimizing GPU die area for LLM inference over other pursuits: the FP64 units in the latest Blackwell GPUs were greatly reduced and FP4 was added