Oh you can still run them unquantized! See https://docs.unsloth.ai/basics/llama-...

		danielhanchen 6 days ago \| parent \| context \| favorite \| on: Deepseek R1-0528 Oh you can still run them unquantized! See https://docs.unsloth.ai/basics/llama-4-how-to-run-and-fine-t... where we show you can offload all MoE layers to system RAM, and leave non MoE layers on the GPU - the speed is still pretty good! You can do it via `-ot ".ffn_.*_exps.=CPU"`

behnamoh 6 days ago [–]

Thanks, I'll try it! I guess "mixing" GPU+CPU would hurt the perf tho.