Hacker News new | past | comments | ask | show | jobs | submit login

> For most current models, you need 40+ GB of RAM to train them. Gradient accumulation doesn't work with batch norms so you really need that memory.

There's a decision tree chart in the article that addresses this - as it points out there are plenty of models that are much smaller than this.

Not everything is a large language model.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: