I find this article odd with its fixation on computing speed and 8bit. For most ...

mota7 · on Jan 18, 2023

> Gradient accumulation doesn't work with batch norms so you really need that memory.

Last I looked, very few SOTA models are trained with batch normalization. Most of the LLMs use layer norms which can be accumulated? (precisely because of the need to avoid the memory blowup).

Note also that batch normalization can be done in a memory efficient way: It just requires aggregating the batch statistics outside the gradient aggregation.

fxtentacle · on Jan 21, 2023

wav2vec2, whisper, HifiGAN, Stable Diffusion, and Imagen all use BatchNorm.

jszymborski · on Jan 18, 2023

> It doesn't have enough RAM for most problems.

It might not be as glamorous or make as many headlines, but there is plenty of research that goes on below 40Gb.

While I most commonly use A100s for my research, all my models fit on my personal RTX 2080.

ngcc_hk · on Jan 18, 2023

Wonder are you trying to walk around the limit or it just happen like this?

jszymborski · on Jan 18, 2023

We're trying to walk around to limits:

I) My research involves biological data (protein-protein interactions) and my datasets are tiny (about 30K high-confidence samples). We have to regularize aggressively and use a pretty tiny network to get something that doesn't over-fit horrendously.

II) We want to accommodate many inferences (10^3 to 10^12) inferences on a personal desktop or cheap OVH server in little time so we can serve the model on an online portal.

varunkmohan · on Jan 18, 2023

I'm not sure any of this is accurate. 8 bit inference on a 4090 can do 660 Tflops and on an H100 can do 2 Pflops. Not to mention, there is no native support for FP8 (which are significantly better for deep learning) on existing CPUs.

The memory on a 4090 can serve extremely large models. Currently, int4 is started to become proven out. With 24GB of memory, you can serve 40 billion parameter models. That coupled with the fact that GPU memory bandwidth is significantly higher than CPU memory bandwidth means that CPUs should rarely ever be cheaper / lower latency than GPUs.

lostmsu · on Jan 18, 2023

> Almost everyone using GPUs for work will join NVIDIA's Inception program and get rebates... So why look at retail prices?

They need to advertise it better. First time I hear about it.

What are the prices like there? GPUs/workstations?

fxtentacle · on Jan 19, 2023

Depends on who you know, but I've seen as low as €799 per new 3090 TI. But you need to waive the right to resell them and there are quotas, for obvious reasons.

alwayslikethis · on Jan 19, 2023

Consumer parts are dirt cheap compared to enterprise ones. Most companies are not able to use them at scale due to CUDA license terms. I don't think there is much of a need for rebates here. For hobbyists, it is somewhat of a steep price for the latest cards, but it's already way down from the height of ETH mining a year back.

nl · on Jan 18, 2023

> For most current models, you need 40+ GB of RAM to train them. Gradient accumulation doesn't work with batch norms so you really need that memory.

There's a decision tree chart in the article that addresses this - as it points out there are plenty of models that are much smaller than this.

Not everything is a large language model.

meragrin_ · on Jan 18, 2023

> Almost everyone using GPUs for work will join NVIDIA's Inception program and get rebates... So why look at retail prices?

So maybe they were including information for the hobbyists/students which do not need or cannot afford the latest and greatest professional cards?

habibur · on Jan 19, 2023

> If you can do 8bit inference, don't use a GPU. CPU will be much cheaper and potentially also lower latency.

Good advice. Does that mean that I can install like 64 gb ram on a PC and run those models in comparable time?

fxtentacle · on Jan 19, 2023

That's how cloud speech recognition is usually deployed. OpenAI whisper is faster than realtime on regular desktop CPUs, which I guess is good enough.

And for a datacenter, a few $100 AMD CPUs will beat a single $20k NVIDIA A100 at throughput per dollar.

ftufek · on Jan 18, 2023

> Also: Almost everyone using GPUs for work will join NVIDIA's Inception program and get rebates... So why look at retail prices?

Out of curiosity, does that also apply for consumer grade GPUs?

chrisMyzel · on Jan 18, 2023

you can get RTX/A6000s but not 3090s or 4090s via inception.

lostmsu · on Jan 18, 2023

Prices? Prebuilt workstations? Or should we just apply and see, is that easy?

fxtentacle · on Jan 19, 2023

Companies with 4+ employees only.

lostmsu · on Jan 19, 2023

FAQ says 1 developer.

fxtentacle · on Jan 19, 2023

The client I work with can order 3090 TI and 4090 through their Inception link (in Germany). Apparently, it varies by partner.

tambre · on Jan 18, 2023

I would be surprised if it did. But you probably shouldn't do professional work on GPUs that lack ECC memory.

nl · on Jan 18, 2023

The lack of ECC memory is almost certainly not a factor. If you can train at FP8 your model will recover from a single flipped bit somewhere.

Loranubi · on Jan 19, 2023

I mean you could even view bit flips as a regularization technique like dropout...

dahart · on Jan 19, 2023

Yeah I hear it’s common practice now to avoid synchronizing GPU training kernels in order to speed things up, and it has positive regularization benefits and little downside.

theLiminator · on Jan 18, 2023

Anyone know if the GPUs are relatively affordable through Inception?

nonbirithm · on Jan 18, 2023

The 4090 Ti is rumored to have 48GB of VRAM, so one can only hope.

happycube · on Jan 19, 2023

They nerfed the heck out of board memory in the 3000 series (3080 20GB was even made in limited quantity... going to miners in China :( ) so color me a bit skeptical.