I asked it a few questions and it responded exactly like all the other models do. Some of the questions were difficult / very specific, and it failed in the same way all the other models failed.
Great example of this general class of reasoning failure.
“AI does badly on my test therefore it’s bad”.
The correct question to ask is, of course, what is it good at? (For bonus points, think in terms of $/task rather than simply being dominant over humans.)
“I used an 8088 CPU to whisk egg whites, then an Intel core 9i-12000-vk4*, and they were equally mediocre meringues, therefore the latest Intel processor isn’t a significant improvement over one from 50 years ago”