How much have you used Claude 4?

hsn915 · 2025-05-22T20:24:20 1747945460

I asked it a few questions and it responded exactly like all the other models do. Some of the questions were difficult / very specific, and it failed in the same way all the other models failed.

theptip · 2025-05-23T04:51:15 1747975875

Great example of this general class of reasoning failure.

“AI does badly on my test therefore it’s bad”.

The correct question to ask is, of course, what is it good at? (For bonus points, think in terms of $/task rather than simply being dominant over humans.)

atworkc · 2025-05-23T09:05:03 1747991103

"AI does badly on my test much like other AI's did before it, therefore I don't immediately see much improvement" is a fair assumption.

brookst · 2025-05-23T12:52:35 1748004755

No, it’s really not.

“I used an 8088 CPU to whisk egg whites, then an Intel core 9i-12000-vk4*, and they were equally mediocre meringues, therefore the latest Intel processor isn’t a significant improvement over one from 50 years ago”

* Bear with me, no idea their current naming

Kon-Peki · 2025-05-23T16:09:37 1748016577

You’re holding them wrong. An 8088 package should be able to emulate a whisk about a million times better than an i9.

theptip · 2025-05-23T14:49:18 1748011758

“Human can’t fly, much like other humans. Therefore it’s bad”

Spot the problem now?

AI capabilities are highly jagged, they are clearly superhuman in many dimensions, and laughably bad compared to humans in others.