I still don't understand what the incentive is for releasing genuinely good mode...

mirekrusin · 2025-12-02T15:54:16 1764690856

Because there is no money in making them closed.

Open weight means secondary sales channels like their fine tuning service for enterprises [0].

They can't compete with large proprietary providers but they can erode and potentially collapse them.

Open weights and research builds on itself advancing its participants creating environment that has a shot at proprietary services.

Transparency, control, privacy, cost etc. do matter to people and corporations.

[0] https://mistral.ai/solutions/custom-model-training

talliman · 2025-12-02T15:42:48 1764690168

Until there is a sustainable, profitable and moat-building business model for generative AI, the competition is not to have the best proprietary model, but rather to raise the most VC money to be well positioned when that business model does arise.

Releasing a near stat-of-the-art open model instanly catapults companies to a valuation of several billion dollars, making it possible raise money to acquire GPUs and train more SOTA models.

Now, what happens if such a business model does not emerge? I hope we won't find out!

mirekrusin · 2025-12-02T15:58:04 1764691084

Explained well in this documentary [0].

[0] https://www.youtube.com/watch?v=BzAdXyPYKQo

simgt · 2025-12-02T16:21:24 1764692484

I was fully expecting that but it doesn't get old ;)

memming · 2025-12-02T15:54:11 1764690851

It’s funny how future money drive the world. Fortunately it’s fueling progress this time around.

NitpickLawyer · 2025-12-02T15:50:27 1764690627

> gpt-oss that games the benchmarks just for PR.

gpt-oss is killing the ongoing AIME3 competition on kaggle. They're using a hidden, new set of problems, IMO level, handcrafted to be "AI hardened". And gpt-oss submissions are at ~33/50 right now, two weeks into the competition. The benchmarks (at least for math) were not gamed at all. They are really good at math.

lostmsu · 2025-12-02T17:11:35 1764695495

Are they ahead of all other recent open models? Is there a leaderboard?

NitpickLawyer · 2025-12-02T17:23:03 1764696183

There is a leaderboard [1] but we'll have to wait till april for the competition to end to know what models they're using. The current number 3 on there (34/50) has mentioned in discussions that they're using gpt-oss-120b. There were also some scores shared for gpt-oss-20b, in the 25/50 range.

The next "public" model is qwen30b-thinking at 23/50.

Competition is limited to 1 H100 (80GB) and 5h runtime for 50 problems. So larger open models (deepseek, larger qwens) don't fit.

[1] https://www.kaggle.com/competitions/ai-mathematical-olympiad...

data-ottawa · 2025-12-02T17:45:54 1764697554

I find the qwen3 models spend a ton of thinking tokens which could hamstring them on the runtime limitations. Gpt-oss 120b is much more focused and steerable there.

The token use chart in the OP release page demonstrates the Qwen issue well.

Token churn does help smaller models on math tasks, but for general purpose stuff it seems to hurt.

prodigycorp · 2025-12-02T15:37:32 1764689852

gpt-oss are really solid models. by far the best at tool calling, and performant.

nullbio · 2025-12-02T17:00:04 1764694804

Google games benchmarks more than anyone, hence Gemini's strong bench lead. In reality though, it's still garbage for general usage.