Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I wonder if it will spur nvidia to work on an inference only accelerator.

Arguably that's a GPU? Other than (currently) exotic ways to run LLMs like photonics or giant SRAM tiles there isn't a device that's better at inference than GPUs and they have the benefit that they can be used for training as well. You need the same amount of memory and the same ability to do math as fast as possible whether its inference or training.



> Arguably that's a GPU?

Yes, and to @quadrature's point, NVIDIA is creating GPUs explicitly focused on inference, like the Rubin CPX: https://www.tomshardware.com/pc-components/gpus/nvidias-new-...

"…the company announced its approach to solving that problem with its Rubin CPX— Content Phase aXcelerator — that will sit next to Rubin GPUs and Vera CPUs to accelerate specific workloads."


Yeah, I'm probably splitting hairs here but as far as I understand (and honestly maybe I don't understand) - Rubin CPX is "just" a normal GPU with GDDR instead of HBM.

In fact - I'd say we're looking at this backwards - GPUs used to be the thing that did math fast and put the result into a buffer where something else could draw it to a screen. Now a "GPU" is still a thing that does math fast, but now sometimes, you don't include the hardware to put the pixels on a screen.

So maybe - CPX is "just" a GPU but with more generic naming that aligns with its use cases.


There are some inference chips that are fundamentally different from GPUs. For example, one of the guys who designed Google's original TPU left and started a company (with some other engineers) called groq ai (not to be confused with grok ai). They make a chip that is quite different from a GPU and provides several advantages for inference over traditional GPUs:

https://www.cdotrends.com/story/3823/groq-ai-chip-delivers-b...


The AMD NPU has more than 2x the performance per watt versus basically any Nvidia GPU. Nvidia isn't leading because they are power efficient.

And no, the NPU isn't a GPU.


Maybe a better way to make my point - the GPU is nvidias golden goose egg and it's good enough that they may go down with the ship. For example (illustrative numbers) - if it costs nvidia $100 to make a GPU they can sell to gamers for $2000, researchers for $5000 and enterprise for $15,000, would it make sense for them to start from scratch and invest billions to make something that's today an unknown amount better and that would only be interesting to the $15,000 market they've already cornered? (Yes, I'm assuming there are more gamers than people who want to run a local LLM)


I would submit Google's TPUs are not GPUs.

Similarly, Tenstorrent seems to be building something that you could consider "better", at least insofar that the goal is to be open.


Isn't Etched's Soho ASIC claimed to be much better than a GPU?

https://www.etched.com/announcing-etched


I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.


They’re already optimizing GPU die area for LLM inference over other pursuits: the FP64 units in the latest Blackwell GPUs were greatly reduced and FP4 was added




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: