Hacker News new | past | comments | ask | show | jobs | submit login

I use Gemini for almost everything. But their model card[1] only compares to o3-mini! In known benchmarks o3 is still ahead:

        +------------------------------+---------+--------------+
        |         Benchmark            |   o3    | Gemini 2.5   |
        |                              |         |    Pro       |
        +------------------------------+---------+--------------+
        | ARC-AGI (High Compute)       |  87.5%  |     —        |
        | GPQA Diamond (Science)       |  87.7%  |   84.0%      |
        | AIME 2024 (Math)             |  96.7%  |   92.0%      |
        | SWE-bench Verified (Coding)  |  71.7%  |   63.8%      |
        | Codeforces Elo Rating        |  2727   |     —        |
        | MMMU (Visual Reasoning)      |  82.9%  |   81.7%      |
        | MathVista (Visual Math)      |  86.8%  |     —        |
        | Humanity’s Last Exam         |  26.6%  |   18.8%      |
        +------------------------------+---------+--------------+
[1] https://storage.googleapis.com/model-cards/documents/gemini-...



The text in the model card says the results are from March (including the Gemini 2.5 Pro results), and o3 wasn't released yet.

Is this maybe not the updated card, even though the blog post claims there is one? Sure, the timestamp is in late April, but I seem to remember that the first model card for 2.5 Pro was only released in the last couple of weeks.


o3 is $40/M output tokens and 2.5 Pro is $10-15/M output tokens so o3 being slightly ahead is not really worth 4 times more than gemini.


Also, o3 is insanely slow compared to Gemini 2.5 Pro


Not sure why this is being downvoted, but it's absolutely true.

If you're using these models to generate code daily, the costs add up.

Sure, I'll give a really tough problem to o3 (and probably over ChatGPT, not the API), but on general code tasks, there really isn't meaningful enough difference to justify 4x the cost.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: