I find this article odd with its fixation on computing speed and 8bit.
For most current models, you need 40+ GB of RAM to train them. Gradient accumulation doesn't work with batch norms so you really need that memory.
That means either dual 3090/4090 or one of the extra expensive A100/H100 options. Their table suggests the 3080 would be a good deal, but it's not. It doesn't have enough RAM for most problems.
If you can do 8bit inference, don't use a GPU. CPU will be much cheaper and potentially also lower latency.
Also: Almost everyone using GPUs for work will join NVIDIA's Inception program and get rebates... So why look at retail prices?
> Gradient accumulation doesn't work with batch norms so you really need that memory.
Last I looked, very few SOTA models are trained with batch normalization. Most of the LLMs use layer norms which can be accumulated?
(precisely because of the need to avoid the memory blowup).
Note also that batch normalization can be done in a memory efficient way: It just requires aggregating the batch statistics outside the gradient aggregation.
I) My research involves biological data (protein-protein interactions) and my datasets are tiny (about 30K high-confidence samples). We have to regularize aggressively and use a pretty tiny network to get something that doesn't over-fit horrendously.
II) We want to accommodate many inferences (10^3 to 10^12) inferences on a personal desktop or cheap OVH server in little time so we can serve the model on an online portal.
I'm not sure any of this is accurate. 8 bit inference on a 4090 can do 660 Tflops and on an H100 can do 2 Pflops. Not to mention, there is no native support for FP8 (which are significantly better for deep learning) on existing CPUs.
The memory on a 4090 can serve extremely large models. Currently, int4 is started to become proven out. With 24GB of memory, you can serve 40 billion parameter models. That coupled with the fact that GPU memory bandwidth is significantly higher than CPU memory bandwidth means that CPUs should rarely ever be cheaper / lower latency than GPUs.
Depends on who you know, but I've seen as low as €799 per new 3090 TI. But you need to waive the right to resell them and there are quotas, for obvious reasons.
Consumer parts are dirt cheap compared to enterprise ones. Most companies are not able to use them at scale due to CUDA license terms. I don't think there is much of a need for rebates here. For hobbyists, it is somewhat of a steep price for the latest cards, but it's already way down from the height of ETH mining a year back.
Yeah I hear it’s common practice now to avoid synchronizing GPU training kernels in order to speed things up, and it has positive regularization benefits and little downside.
They nerfed the heck out of board memory in the 3000 series (3080 20GB was even made in limited quantity... going to miners in China :( ) so color me a bit skeptical.
For most current models, you need 40+ GB of RAM to train them. Gradient accumulation doesn't work with batch norms so you really need that memory.
That means either dual 3090/4090 or one of the extra expensive A100/H100 options. Their table suggests the 3080 would be a good deal, but it's not. It doesn't have enough RAM for most problems.
If you can do 8bit inference, don't use a GPU. CPU will be much cheaper and potentially also lower latency.
Also: Almost everyone using GPUs for work will join NVIDIA's Inception program and get rebates... So why look at retail prices?