The text in the model card says the results are from March (including the Gemini 2.5 Pro results), and o3 wasn't released yet.
Is this maybe not the updated card, even though the blog post claims there is one? Sure, the timestamp is in late April, but I seem to remember that the first model card for 2.5 Pro was only released in the last couple of weeks.
Not sure why this is being downvoted, but it's absolutely true.
If you're using these models to generate code daily, the costs add up.
Sure, I'll give a really tough problem to o3 (and probably over ChatGPT, not the API), but on general code tasks, there really isn't meaningful enough difference to justify 4x the cost.