> I encourage everyone to reproduce this test just to see how well current AI wo...

> I encourage everyone to reproduce this test just to see how well current AI works for this use case.

I do this regularly and find it very enlightening. After I’ve read a news article or done my own research on a topic I’ll ask ChatGPT to do the same.

You have to be careful when reading its response to not grade on a curve, read it as if you didn’t do the research and you don’t know the background. I find myself saying “I can see why it might be confused into thinking X but it doesn’t change the fact that it was wrong/misleading”.

I do like when LLM‘s cite their sources, mostly because I find out they’re wrong. Many times I’ve read a summary, then followed it to the source, read the entire source, and realized it says nothing of the sort. But almost always, I can see where it glued together pieces of the source, incorrectly.

A great micro example of this are the Apple Siri summaries for notifications. Every time they mess up hilariously I can see exactly how they got there. But it’s also a mistake that no human would ever make.