Depends heavily on the architecture too, I think a free-for-all to find the better sizes is still kind of ongoing, and rightly so. GPT-OSS-120B for example fits in around 61GB VRAM for me when on MXFP4.
Personally, I hope GPU makers instead start adding more VRAM, or if one can dream, expandable VRAM.
Well, GPUs are getting more VRAM, although it's pricey. But we didn't used to have 96GB VRAM GPUs at all, now they do exist :) But for the ones who can afford it, it is at least possible today. Slowly it increases.
Hehe me too…went all out on a MBP in 2022, did it again in April. Only upgrade I didn’t bother with was topping out at 128 GB of RAM instead of 64. Then, GPT-OSS 120B comes out and quickly makes me very sad I can’t use it locally
Same. I repeatedly kick myself for not getting the 128GB version, although not for the GPT-OSS model because I really haven’t been too impressed with it (through cloud providers). But now it’s best to wait until the M5 Max is out due to the new GPU neural accelerators that should greatly speed up prompt processing.
Personally, I hope GPU makers instead start adding more VRAM, or if one can dream, expandable VRAM.