_sigh_ Yes LLMs hallucinate, no it's no longer 2022 and ChatGPT (gpt-3.5) is the...

chmod775 · 2025-11-14T16:30:47 1763137847

I must be holding it wrong then, because in my ChatGPT history I've abandoned 2/3rds of my conversations recently because it wasn't coming up with anything useful.

Granted, most of that was debugging some rather complicated typescript types in a custom JSX namespace, which would probably be considered hard even for most humans as well as there being comparatively few resources on it to be found online, but the issue is that overall it wasted more of my time than it saved with its confidently wrong answers.

When I look at my history I don't see anything that would be worth twenty bucks - what I see makes me think that I should be the one getting paid.

topaz0 · 2025-11-14T18:55:28 1763146528

I think the reason people talk past each other on this is that some of them are using LLMs for every little question they have, and others are using them only for questions that they can't trivially answer some other way. Sure, if all your questions have straightforward, uncontroversial answers then the LLMs will often find them on the first try, but on the other hand you'd also find them on the first try on wikipedia, or the man page, or a google search. You'll only think the ChatGPT is useful if you've forgotten how to use the web.

If you're only asking genuinely difficult questions, then you need to check every single time. And it's worse, because for genuinely difficult questions, it's often just as hard to check whether it's giving garbage as it would have been to learn enough to answer the question in the first place.

coffeebeqn · 2025-11-14T17:34:03 1763141643

If a coworker is wrong 40% or 60% of the time I’ll ignore their suggestion either way

danielbln · 2025-11-14T18:47:44 1763146064

As you should, but an LLM is not a human, nor is it categorically 40-60% wrong, so I'm not sure what your point is.

georgemcbay · 2025-11-14T17:35:35 1763141735

> Modern LLMs in an agentic loop can self correct

If the problem as stated is "Performing an LLM query at newly inflated cost $X is an iffy value proposition because I'm not sure if it will give me a correct answer" then I don't see how "use a tool that keeps generating queries until it gets it right" (which seems like it is basically what you are advocating for) is the solution.

I mean, yeah, the result will be more correct answers than if you just made one-off queries to the LLM, but the costs spiral out of control even faster because the agent is going to be generating more costly queries to reach that answer.

refulgentis · 2025-11-14T17:45:44 1763142344

Apologies that you're taking on the chin here. Generally, I'll just skip fantastical HN threads with a critical mass of BS like this, with pity, rather than an attempt to share (for more on that c.f. https://news.ycombinator.com/item?id=45929335)

Been on HN 16 years and never seen anything like the pack of people who will come out to tell you it doesn't work and they'll never pay for it and it's wrong 50% of the time, etc.

Was at dinner with an MD a few nights back and we were riffing on this, came to the conclusion is was really fun for CS people when the idea was AI would replace radiologists, but when the first to be mowed down are the keyboard monkeys, well, it's personal and you get people who are years into a cognitive dissonance thing now.

Libidinalecon · 2025-11-16T13:30:18 1763299818

I just totally disagree.

I want AI to be as strong as possible. I want AGI, I especially want super intelligence. I will figure out a new and better job if you give me super intelligence.

The problem is not cognitive dissonance, the problem is we don't have what we are pretending we have.

We have the dot com bubble but with a bunch of Gopher servers and the web browser as this theoretical idea yet to be invented and that is the bull case. The bear case is we have the dot com bubble but still haven't figured out how to build the actual internet. Massive investment in rotary phone capacity because everyone in the future is going to be using so much phone dial up bandwidth when we finally figure out how to build the internet.

danielbln · 2025-11-14T19:07:38 1763147258

Yeah, it really pulled the veil away, didn't it? So much dismissiveness and uninformed takes, from a crowd that had been driving automation forward for years and years and you'd think they'd get more familiar with these new class of tools, warts and all.

Libidinalecon · 2025-11-16T13:30:54 1763299854

I just can't understand how anyone who actually uses the tools all the time can say this.

danielbln · 2025-11-16T14:16:50 1763302610

Say what exactly? Driving automation of all kind with Claude Code level tools has been incredibly fruitful. And once you spent sufficient time with them you know when and where they fall on their faces and when they provide real tangible reproducible benefits. I could not care less for the AI hype or bubble or whatever, I just use what I see works as I'm staring these tools down for 10h+/day.

The problem is that these conversations are increasingly drifting apart as everyone has different priors and experiences with this stuff. Some are stuck in 2023, some have so very specialized tasks that it's more work whipping the agent in line that it saves, others found a ton of automation cases where this stuff provides clear net benefits.

Don't care for AGI, AI girlfriends or LLM slop, but strap 'em in a loop and build a cage for them to operate in without lobotomizing themselves and there's absolutely something to be gained there (for me, at least).

dwa3592 · 2025-11-14T16:34:29 1763138069

really? >>many tasks that do not suffer from "need to check every single time"

like which tasks?

How do you decide whether you need to check or not?

If you're asking it to complete 100 sequences, and if the error rate is 5%, which 5% of the sequences do you think it messed up or _thought_ otherwise? if the 5% is in the middle, would the next 50 sequences be okay?

palmotea · 2025-11-14T17:08:16 1763140096

> really? >>many tasks that do not suffer from "need to check every single time"

> like which tasks?

Making slop.

hamburglar · 2025-11-14T17:11:01 1763140261

If I ask an LLM to guess what number I’m thinking of and it’s wrong 99.9% of the time, the error is not in the LLM.