Disillusioned Businesses Discovering That AI Kind of Sucks

zdragnar · on March 31, 2024

The real question, to my mind, is "is the current generation of AI yet another dead end?", because whether or not the tech can improve upon its flaws will determine whether or not it is worth a business investing in.

We've gone through AI winters before, where the actual techniques and hardware simply had a terminal point of usefulness beyond which it was unlikely to grow.

If hallucinating bad information is to be regularly expected / intrinsic to the tech, then it's basically Clippy 2.0 and a dead end.

On the other hand, if we can expect lower power costs and higher trust in the output (i.e. needing less human intervention) then it makes sense to start finding places where it can fit into the business and grow over time.

I'm personally in the camp that it is a fun toy but with limited applicability for most businesses, and unlikely to grow beyond that. I'd love to be proven wrong, though.

TillE · on March 31, 2024

I've found that LLMs are most exciting and useful when factual accuracy is irrelevant, like as a personal tool for creative brainstorming. Or searching for information that you're going to immediately validate anyway.

It's definitely something more than a toy, but I'm not sure that you can build a trillion dollar industry on that kind of stuff.

logicprog · on March 31, 2024

The problem with using large language models for brainstorming or writing is that the fundamental mechanism by which they work is to choose the most common thing to say at any given point — that is, the most average, middle of the bell curve thing to say. That's how they give the appearance of having any form of coherence, by rarely if ever deviating from the happy path. So any ideas you get from it are going to be pretty unoriginal, and even if you give it original prompts for ideas, it's eventually just going to regress back to the mean again as it traverses the probability space back in the direction of the average at every step. And its writing is always going to be essentially the average human writing style.

kthejoker2 · on April 1, 2024

The public foundational models maybe.

But this isn't true in general, you can easily train a local model to write in very localized styles, and include temperatures that allow for wild swings outside of average.

If you want a rambling, occasionally brilliant Kerouac or de Montaigne you can make one.

seanmcdirmid · on March 31, 2024

Ya, having AI write a story based on some criteria, hallucinations aren’t much of a concern. I wonder, however, if you can curate and add to your content to make a AI chat bot more effective, since they are all attention based, focus on the attention and then present AI has the tool for getting info for that topic.

andy99 · on March 31, 2024

FWIW, I think inaccuracy - what people call hallucination - will probably be limiting in some applications, like public facing broad based chat applications with stringent requirements on factuality.

There are lots of other applications - in particular I'm thinking about turning unstructured data into structured data for aggregation and analysis, where you can tolerate some level of error and LLMs can work quite well to automate it.

Beyond that, obviously copilot or assistant type stuff is already something people use. It doesn't have to be flawless for that, the main use case for chatGPT is not making high stakes decisions blindly based on the answers it gives out. It makes mistakes and does dumb stuff sometimes but it's also helpful, the flaws aren't a dealbreaker even if people would like them to be.

At the same time, I don't expect it to get much better, it's going to be about getting useful work out of what we have.

onthecanposting · on March 31, 2024

The criticism in the article seems pretty weak. If the goal was an omniscient oracle, then, yeah, probably not the god you were expecting. However, I don't see how this is not more cost effective than an outsourced call center. Speaking to a person in another country who barely understands my language and has 5 scripted responses can't be an improvement. The article does not even speak to error rates for humans performing the same task.

mnk47 · on March 31, 2024

>On the other hand, if we can expect lower power costs and higher trust in the output (i.e. needing less human intervention) then it makes sense to start finding places where it can fit into the business and grow over time.

So far, simply throwing more compute at the problem is making models hallucinate less and less. I think this easily explains why all the big players are making such massive investments in hardware, reliability seems to be an almost solved problem.

wkirby · on March 30, 2024

Our agency has been asked to explore AI projects for a half dozen clients over the past 18 months. None of them have actually rolled out to real users. We keep finding the same things: the “AI” backed tool is worse than the people it’s supposed to replace, and too costly to implement and maintain at any real scale. Mix in concerns about PHI (we primarily work with healthcare-related businesses) and it all amounts to the same story: that’s cool, but…

vundercind · on March 30, 2024

Yeah, the expense of doing it “right” is really high and much of it’s an ongoing cost, while the cases in which it’s better than cheaper options like hiring a few more lowish-wage workers or doing a little training or just… using some normal-ass program that does the same thing but a bit less “smart”, aren’t as common as businesses seem to wish they were.

dools · on March 30, 2024

You just have to be a bit more specific. I have had success with AI writing code but I have to drive it. It’s a massive time saver though.

I have also had success with classification tasks, like look at this email and then look at this list of topics, and pick which topic this email relates to, or “other” if you can’t see an obvious choice.

But you can’t say “hey AI do this person’s job for me”.

mnk47 · on March 31, 2024

Yeah, I think we're just at an awkward point where you still have to do some prompting tricks to make it work as expected. Hopefully this all gets smoothed out soon and makes hacky solutions like "prompt engineering" less important.

krapp · on March 30, 2024

>But you can’t say “hey AI do this person’s job for me”.

Unfortunately, that's exactly what businesses want to accomplish.

spamizbad · on March 30, 2024

> But you can’t say “hey AI do this person’s job for me”.

If AI needs to steered around by an expert to be effective that's going to limit its appeal.

etempleton · on March 31, 2024

I was thinking about this the other day. AI reminds me of your average to below average intern. They kind of get it right, but you have to spend time holding their hand and correcting their work.

I don’t see AI replacing experienced professionals any time soon, but I do see it replacing the need for some entry level employees soon.

dools · on March 31, 2024

Why? I wrote a regex in 45 minutes the other day that would have taken me days to get right without AI. It’s an enormously underpriced production boost right now.

spamizbad · on March 31, 2024

To me this a huge signal that a regex was not the correct approach and that AI just steered you directly into (or passed) Kernighan's Law.

dools · on March 31, 2024

No, it’s a very specific regex for parsing a very finicky file format. It’s definitely the best tool for the job, although funnily enough ChatGPT couldn’t get it right in a single regex but Claude could.

ChatGPT chickened out and kept giving me functions that looped through applying various filters and regexes.

daymanstep · on March 31, 2024

How do you know that the AI got it right?

dools · on March 31, 2024

The the code parses the files correctly.

add-sub-mul-div · on March 31, 2024

Maybe the LLM prevented an intervention where you were asked why the thing you were working on was taking days, surfacing the need for a better approach.

dools · on March 31, 2024

Nope, about a year or so ago we had a similar problem with the same piece of code and instead of fixing it I said “that’ll take me a couple of days the get right” and the guy I was doing it for just went and manually fixed the file format which took him a few hours.

This time when a similar thing happened I was able to fix it before he even had to ask how long it might take and the solution is more robust and able to handle several different formats now.

It was a definite win.

atmavatar · on March 31, 2024

> You just have to be a bit more specific. I have had success with AI writing code but I have to drive it.

"Computer, in the Holmesian style, create a mystery to confound Data with an opponent who has the ability to defeat him."

MattGaiser · on March 30, 2024

It is closer to a shovel vs hands rather than an automated excavator.

svaha1728 · on March 31, 2024

And the gems you are trying to excavate might not even be in the training set.

GolDDranks · on March 31, 2024

We are just a month away from releasing an internal tool for our sales dept that combined LLM and old fashioned statistical models to generate draft sales speeches. (We are a selling to businesses on a specific field, and our sales speeches are usually highly data-driven and customised per client, placing emphasis of how much additional revenue would the client get out of the purchase according to our analysis.)

It seems to work pretty well for this scenario. The usage is company internal (only draft generation) and the sales reps are augmented by the AI, not replaced by it.

sirspacey · on March 31, 2024

That sounds really cool. We’ve been doing call mapping to account plans and it’s pretty useful so far.

geor9e · on March 31, 2024

AI doesn't suck any more than a socket wrench sucks in the hand of a bad mechanic. But ask a world class mechanic if they'd ever give up their socket wrench. Some businesses just aren't competent at taking advantage of new things, and competition is fierce. Just like the dot-com era, everyone wants a piece of this new pie, even less competent folks, and most will fail, but many will get rich. Plenty of businesses are already making gobs of income, with just some public model off civitai and a well crafted prompt, wrappered into a slick subscription website. Personally I find AI a godsend and have duct taped dozens of HTTP POST API calls with huge custom prompts to anthropic/openai/groq/etc all over my mac hotkeys and phone voice assistants (via Tasker tasks) etc. Anything that an LLM can do to make my life a little easier, I turn it into a HTTP request and tie it in to some automation.

foinker · on March 31, 2024

I don't entirely disagree with you, but I think your wrench analogy only really works if the wrench sometimes thinks it is a screwdriver or a piece of wood for the majority of users, and only experienced highly technical mechanics can consistently guide it to remembering that it is a wrench.

The problem is that the toothpaste is already out of the tube in terms of what it's been hyped to be for the average person, and they are disappointed to learn that it doesn't really work that way. And there are many people who have not yet realized, or realized too late, that it's not appropriate for the work they're doing and its "assistance" is ending up in finished products in any case.

Nullabillity · on March 31, 2024

A socket wrench won't randomly decide to turn the wrong direction every os often.

ChrisArchitect · on March 30, 2024

Actual article: https://www.axios.com/2024/03/27/ai-chatbot-letdown-hype-rea...

MattPalmer1086 · on March 31, 2024

No, businesses are discovering that the uses that it is being sold to them for are inappropriate.

It may not be factually accurate. It may say disturbing things. It is not a reliable company representative. This does not mean AI sucks, it means you are trying to use it for things it is not appropriate for.

On the other hand, it can be a massive time saver for staff who know what they're doing and can interpret it's output. It's a tool that can boost output, not a replacement for people.

gedy · on March 30, 2024

With LLMs at least, it feels like it's best to just treat "AI" as another type of user interface. If your business/idea can use an interface like this, cool, it might be a nicer way to interact. But it's not there yet for replacing most employees, and honestly feels like it's being used as a scape goat for a crappy economy or monetary policy driven layoffs.

phkahler · on March 31, 2024

If you watch Sam Altman talking on the Lex Friedman podcast, he clearly thinks ChatGPT 4 is kinda cool and that 5 will be better. But he really dropped the hype and sounded more like this article.

silverquiet · on March 30, 2024

A friend of mine’s mother came down with a rare form of dementia at a sadly young age for that to happen. It was a strange experience talking with her because in most ways she was very lucid, but sometimes she would just veer off into stories that were obviously delusional. Conversing with the LLMs rather reminds me of that sometimes. But she was not able to work because of this disability.

ben_w · on March 31, 2024

Likewise, I had similar vibes between early versions of ChatGPT and my mother's Alzheimer's.

Not had that vibe for a while now, but I definitely noticed it.

ashildr · on March 31, 2024

LLM are not “AI”.

lardo · on March 31, 2024

What is?

ashildr · on April 11, 2024

Nothing, for now.

andy99 · on March 30, 2024

So the headline is pretty clearly right, we were coming out of the circa 2016 deep learning bubble and expectations were being adjusted, gen AI came in and the hype roared back up, now we're peaking, the same problems from before are still there even if the demos are cooler, and expectations need to be tempered again.

But instead of that, the article starts with

  The tech's drawbacks are hard to overlook. Large language models like ChatGPT are prone to hallucinating and spreading misinformation. Both chatbots and AI image makers have been accused of plagiarizing writers and artists. And overall, the hardware that generative AI uses needs enormous amounts of energy, gutting the environment.

None of those (hallucination maybe) are relevant, if it's good at automating misinformation, surely it can do useful work as well. This is more just a list of random criticisms.

Then

  Perhaps most of all, according to Gary Marcus...

No point in continuing to read.

pavel_lishin · on March 30, 2024

AI giving wildly incorrect answers is incredibly relevant.

And if it can "surely" do useful work... why isn't it? That's like saying, "Sure, this car burns oil, but if it can do that, it can surely do more than that and be my daily driver."

ben_w · on March 30, 2024

AI does do useful work.

It's also being treated as magic, with all the same unrealistic expectations.

To treat it as more than it is, is as much a mistake as to treat it as less than it is.

I'm glad it's "only" at the level of an intern (sometimes you're willing to hire them, sometimes you're not); but conversely I think this means I can't risk changing career.

Agingcoder · on March 30, 2024

No - these are exactly the issues being brought up where I work.

They make mistakes, carry potential legal liabilities and cost a fortune ( dollars or carbon ) to run. The only difference with before is that they now look a lot closer to being able to replace people for real, and that’s a significant difference.

Now, replace ‘ai’ with ‘employee’ and the profits must be very clear to hire such an unreliable person. Some companies worry about these things.

In my view if the amount of mistakes could be acceptably low, this would change the game ( both ‘ acceptably’ and ‘low’ are valid directions).

Nullabillity · on March 31, 2024

> None of those (hallucination maybe) are relevant, if it's good at automating misinformation, surely it can do useful work as well.

Filtering out the truth is the hard part. If you just want data and don't care about quality there's always /dev/urandom.