What is the best model that would run on a 64GB M2 Max? I'm interested in attempting 70B model but not gimped to 2bit if I can avoid it. Would I bet able to run 4bit 70B?
The biggest issue with local LLM is that best is extremely relative. As in, best for what application? If you want a generic LLM that does ok at everything, you can try Vicuna. If you want coding, code llama 34B is really good. Surprisingly, the 13B code up isn’t not as good, but pretty darn close as code llama 34B.
Not really, a 13B 4 bit fits in about 8GB of RAM. But you have to factor in the unified memory
aspect of M series chips. Your “vram” and ram are from the same pool. So your computer shares the same pool to run the OS and application hence why you can only load a 13B 4bit for 16GB RAM.