Came here to learn what people think about Claude 4. Seems to be only armchair opinions on previous versions and the state of AI.
The industry is not at all surprised that the current architecture of LLMS reached a plateau. Every other machine learning architecture we've ever used has gone through exactly the same cycle and frankly we're all surprised how far this current architecture has gotten us.
Deepmind and OpenAI both publicly stated that they expected 2025 to be slow, particularly in terms of intelligence, well they work on future foundation models.
I've been using `claude-4-sonnet` for the last few hours - haven't been able to test `opus` yet as it's still overloaded - but I have noticed a massive improvement so far.
I spent most of yesterday working on a tricky refactor (in a large codebase), rotating through `3.7/3.5/gemini/deepseek`, and barely making progress. I want to say I was running into context issues (even with very targeted prompts) but 3.7 loves a good rabbit-hole, so maybe it was that.
I also added a new "ticketing" system (via rules) to help it's task-specific memory, which I didn't really get to test it with 3.7 (before 4.0 came out), so unsure how much of an impact this has.
Using 4.0, the rest of this refactor (est. 4~ hrs w/ 3.7) took `sonnet-4.0` 45 minutes, including updating all of the documentation and tests (which normally with 3.7 requires multiple additional prompts, despite it being outlined in my rules files).
The biggest differences I've noticed:
- much more accurate/consistent; it actually finishes tasks rather than telling me it's done (and nothing working)
- less likely to get stuck in a rabbit hole
- stopped getting stuck when unable to fix something (and trying the same 3 solutions over-and-over)
- runs for MUCH longer without my intervention
- when using 3.7:
- had to prompt once every few minutes, 5 - 10mins MAX if the task was straight forward enough
- had to cancel the output in 1/4 prompts as it'd get stuck in the same thought-loops
- needed to restore from a previous checkpoint every few chats/conversations
- with 4.0:
- ive had 4 hours of basically one-shotting everything
- prompts run for 10 mins MIN, and the output actually works
- is remembering to run tests, fix errors, update docs etc
Obviously this is purely anecdotal - and, considering the temperament of LLMS, maybe I've just been lucky and will be back to cursing at it tomorrow, but imo this is the best feeling model since 3.5 released.
The industry is not at all surprised that the current architecture of LLMS reached a plateau. Every other machine learning architecture we've ever used has gone through exactly the same cycle and frankly we're all surprised how far this current architecture has gotten us.
Deepmind and OpenAI both publicly stated that they expected 2025 to be slow, particularly in terms of intelligence, well they work on future foundation models.