A bit important that this model is not general purpose whereas the ones Google a...

yorwba · 2025-12-01T12:02:35 1764590555

Both OpenAI and Google used models made specifically for the task, not their general-purpose products.

OpenAI: https://xcancel.com/alexwei_/status/1946477756738629827#m "we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months."

DeepMind: https://deepmind.google/blog/advanced-version-of-gemini-with... "we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions."

simianwords · 2025-12-01T13:13:15 1764594795

https://x.com/sama/status/1946569252296929727

>we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.

asterisks mine

yorwba · 2025-12-01T13:53:17 1764597197

DeepSeekMath-V2 is also an LLM doing math and not a specific formal math system. What interpretation of "general purpose" were you using where one of them is "general purpose" and the other isn't?

simianwords · 2025-12-01T14:21:09 1764598869

This model can’t be used for say questions on biology or history.

yorwba · 2025-12-01T14:31:45 1764599505

How do you know how well OpenAI's unreleased experimental model does on biology or history questions?

simianwords · 2025-12-01T16:44:56 1764607496

Sam specifically says it is general purpose and also this

> Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

https://x.com/polynoamial/status/1946478250974200272

lossolo · 2025-12-01T17:01:35 1764608495

You are overinterpreting what they said again. "Go/Dota/Poker/Diplomacy" do not use LLMs, which means they are not considered "general purpose" by them. And to prove it to you, look at the OpenAI IMO solutions on GitHub, which clearly show that it's not a general purpose trained LLM because of how the words and sentences are generated there. These are models specifically fine tuned for math.

simianwords · 2025-12-01T18:54:28 1764615268

they could not have been more clear - sorry but are you even reading?

lossolo · 2025-12-01T20:58:46 1764622726

Clear about what? Do you know the difference between an LLM based on transformer attention and a monte carlo tree search system like the one used in Go? You do not understand what they are saying. It was a fine tuned model, just as DeepSeekMath is a fine tuned LLM for math, which means it was a special purpose model. Read the OpenAI GitHub IMO submissions to see the proof.

simianwords · 2025-12-01T12:15:27 1764591327

Not true

mangolie · 2025-12-01T11:57:46 1764590266

https://x.com/deepseek_ai/status/1995452646459858977

Boom

andy12_ · 2025-12-01T13:31:03 1764595863

Do note that that is a different model. The one we are talking about here, DeepSeekMath-V2, is indeed overcooked with math RL. It's so eager to solve math problems, that it even comes up with random ones if you prompt it with "Hello".

https://x.com/AlpinDale/status/1994324943559852326?s=20

yorwba · 2025-12-01T12:11:36 1764591096

That's a different model: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale

simianwords · 2025-12-01T12:02:07 1764590527

Oh you may be correct. Are these models general purpose or fine tuned for mathematics?