gpt-oss-120b chooses 4 experts per token and combines them. I don't know how lms... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		EnPissant 27 days ago \| parent \| context \| favorite \| on: Qwen3-Next gpt-oss-120b chooses 4 experts per token and combines them. I don't know how lmstudio works. I only know the fundamentals. There is not way it's sending experts to the GPU per token. Also, the CPU doesn't have much work to do. It's mostly waiting on memory.

furyofantares 27 days ago [–]

> There is not way it's sending experts to the GPU per token.

Right, it seems like either experts are stable across sequential tokens fairly often, or there's more than 4 experts in memory and it's stable within the in-memory experts for sequential tokens fairly often, like the poster said.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact