Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> they spend the same amount of "thinking time" on "what's 2+2?" as they do on complex mathematical proofs.

Not anymore. Have you seen Gemini 2.5 Pro? Ask it simple questions and it almost doesn't "think". Ask it a coding question and it'll write a long reasoning article. I think the same goes for o3.




The original o1 also didn't do this. Neither did the actual DeepSeek R1. You could even get it to answer immediately without any reasoning tokens. These highly distilled versions just lost most of their common sense for this.


Well, it does overthink quite a bit. if It can reduce overthink,it s gonna be useful


Overthink is subjectibe. It really depends on how much you value the answer.

"how long break distance does a train need if going in 100 km/hour?"

Just need a quick reply and you dont care so much (maybe showerthought)? Or is life and death depending on the answer?

The same question can need different amount of thinking.


> is life and death depending on the answer?

In this situation I suspect you'd still want the answer quickly.


Huge assumption, there is a wide range of various parameters that goes into how accurate you need an response to be, depending on context. As sure as there exists questions that you need 100% accurate response regardless of response times, I'm sure there exists questions on the other extreme.


In this situation you would have someone with actual knowledge of the mechanics involved do the computation using the actual data (e.g., what's the mass of the train? Which kind of breaks does it have?) instead of asking an LLM and trusting it to give the correct answer without checking.


Assuming you could find an expert like that in time, and that they will then be able to understand and solve the problem fast enough to still be helpful.

If you need the answer within a couple hours, you can probably get it for an expert; if you need to get an actionable answer within minutes, based on some back-of-the-envelope calculations, then a SOTA LLM is a much safer bet than flagging whoever seems the smartest in the room and asking them for help.


I assumed we already did such calculations in advance, as it's needed to have proper safety measures.


Why? Lets say your are designing a railway system. It does not matter if it takes 1 sec or an hour if the planning process are months long.


What I really don't like is that I can't manually decide how much thinking it Gemini should allocate to a prompt. You're right sometimes it doesn't think but for me this also happens on complex query where I WOULD want it to think. Even things like "super think about this" etc don't help, it just refuses to


Gemini 2.5 Pro is getting thinking budgets when it GAs in June (at least that's the promise).


This is available for Flash


Yes, we started with the idea of trying to replicate similar control on thinking processes for open reasoning models. They also announced the Deep Think approach at IO which goes even further and combines parallel CoTs at inference.


> I think the same goes for o3.

Definitely, in my experience. Elsewhere in the thread, OP says that open models/systems don't do this, in which case this seems like important work toward making open alternatives competitive.


Is that not just caching? If you have the same query just return the same response.

You could even put a simpler AI in front to decide if it was effectively the same query.


Has Gemini or OpenAI put out any articles on this or is this just something you noticed?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: