> Vibe coding would be catastrophic here. Not because the AI can't write the code - it usually can - but because the failure mode is invisible. A hallucinated edge case in a tax calculation doesn't throw an error. It just produces a slightly wrong number that gets posted to a real accounting platform and nobody notices until the accountant does their review.
How is that different from handwritten code ? Sounds like stuff you deal with architecturally (auditable/with review/rollback) and with tests.
It’s shocking to me that people even ask this type of question. How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human that will logic through things and find the correct answer.
Because I've seen the results ? Failure mode of LLMs are unintuitive, and the ability to grasp the big picture is limited (by context mostly I'd say), but I find CC to follow instructions better than 80% of people I've worked with. And the amount of mental stamina it would take to grok that much context even when you know the system vs. what these systems can do in minutes.
As for the hallucinations - you're there to keep the system grounded. Well the compiler is, then tests, then you. It works surprisingly well if you monitor the process and don't let LLM wander off when it gets confused.
Because humans also make stupid random mistakes, and if your test suite and defensive practices don't catch it, the only difference is the rate of errors.
It may be that you've done the risk management, and deemed the risk acceptable (accepting the risk, in risk management terms) with human developers and that vibecoding changes the maths.
But that is still an admission that your test suite has gaping holes. If that's been allowed to happen consciously, recorded in your risk register, and you all understand the consequences, that can be entirely fine.
But the problem then isn't reflecting a problem with vibe coding, but a risk management choice you made to paper over test suite holes with an assumed level of human dilligence.
> How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human...
Your claim here is that humans can't hallucinate something random. Clearly they can and do.
> ... that will logic through things and find the correct answer.
But humans do not find the correct answer 100% of the time.
The way that we address human fallibility is to create a system that does not accept the input of a single human as "truth". Even these systems only achieve "very high probability" but not 100% correctness. We can employ these same systems with AI.
Almost all current software engineering practices and projects rely on humans doing ongoing "informal" verification. The engineers' knowledge is integral part of it and using LLMs exposes this "vulnerability" (if you want to call it that). Making LLMs usable would require such a degree of formalization (of which integration and end-to-end tests are a part), that entire software categories would become unviable. Nobody would pay for an accounting suite that cost 10-20x more.
Which interestingly is the meat of this article. The key points aren’t that “vibe coding is bad” but that the design and experience of these tools is actively blinding and seductive in a way that impairs ability to judge effectiveness.
Basically, instead of developers developing, they've been half-elevated to the management class where they manage really dumb but really fast interns (LLM's).
But they dont get the management pay, and they are 100% responsible for the LLMs under them. Whereas real managers get paid more and can lay blame and fire people under them.
Humans who fail to do so find the list of tasks they’re allowed to do suddenly curtailed. I’m sure there is a degree of this with LLMs but the fanboys haven’t started admitting it yet.
> It’s shocking to me that people even ask this type of question. How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human that will logic through things and find the correct answer.
I would like to work with the humans you describe who, implicitly from your description, don't hallucinate something random when they don't know the answer.
I mean, I only recently finished dealing with around 18 months of an entire customer service department full of people who couldn't comprehend that they'd put a non-existent postal address and the wrong person on the bills they were sending, and this was therefore their own fault the bills weren't getting paid, and that other people in their own team had already admitted this, apologised to me, promised they'd fixed it, while actually still continuing to send letters to the same non-existent address.
Don't get me wrong, I'm not saying AI is magic (at best it's just one more pair of eyes no matter how many models you use), but humans are also not magic.
Humans are accountable to each other. Humans can be shamed in a code review and reprimanded and threatened with consequences for sloppy work. Most,
humans once reprimanded , will not make the same kind of mistake twice.
> Humans can be shamed in a code review and reprimanded and threatened with consequences for sloppy work.
I had to not merely threaten to involve the Ombudsman, but actually involve the Ombudsman.
That was after I had already escalated several times and gotten as far as raising it with the Data Protection Officer of their parent company.
> Most, humans once reprimanded , will not make the same kind of mistake twice.
To quote myself:
other people in their own team had already admitted this, apologised to me, promised they'd fixed it, while actually still continuing to send letters to the same non-existent address.
> How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human that will logic through things and find the correct answer.
I see this argument over and over agin when it comes to LLMs and vibe coding. I find it a laughable one having worked in software for 20 years. I am 100% certain the humans are just as capable if not better than LLMs at generating spaghetti code, bugs, and nonsensical errors.
It's shocking to me that people make this claim as if humans, especially in some legacy accounting system, would somehow be much better at (1) recognizing their mistakes, and (2) even when they don't, not fudge-fingering their implementation. Like the criticisms of agents are valid, but the incredulity that they will ever be used in production or high risk systems to me is just as incredible. Of course they will -- where is Opus 4.6 compared to Sonnet 4? We've hit an inflection point where replacing hand coding with an agent and interacting only via prompt is not only doable, highly skilled people are already routinely doing it. Companies are already _requiring_ that people do it. We will then hit an inflection point at some time soon where the incredulity at using agents even in the highest stakes application will age really really poorly. Let's see!
Your point is the speculative one, though. We know humans can and have built incredibly complex and reliable systems. We do not have the same level of proof for LLMs.
Claims like your should wait at least 2-3 years, if not 5.
That is also speculative. Well let's just wait and see :) but the writing is on the wall. If your criticism is where we're at _now_ and whether or not _today_ you should be vibe coding in highly complex systems I would say: why not? as long as you hold that code to the same standard as human written code, what is the problem? If you say "well reviews don't catch everything" ok but the same is true for humans. Yes large teams of people (and maybe smaller teams of highly skilled people) have built wonderfully complex systems far out of reach of today's coding agents. But your median programmer is not going to be able to do that.
Your comment is shocking to me. AI coding works. I have seen it with my own eyes last week and today.
I can therefore only assume that you have not coded with the latest models. If you experiences are with GPT 4o or earlier all you have only used the mini or light models, then I can totally understand where you’re coming from. Those models can do a lot, but they aren’t good enough to run on their own.
The latest models absolutely are I have seen it with my own eyes. Ai moves fast.
I think the point he is trying to make is that you can't outsource your thinking to a automated process and also trust it to make the right decisions at the same time.
In places where a number, fraction, or a non binary outcome is involved there is an aspect of growing the code base with time and human knowledge/failure.
You could argue that speed of writing code isn't everything, many times being correct and stable likely is more important. For eg- A banking app, doesn't have be written and shipped fast. But it has to be done right. ECG machines, money, meat space safety automation all come under this.
Replace LLM with employee in your argument - what changes ? Unless everyone at your workplace owns the system they are working on - this is a very high bar and maybe 50% of devs I've worked with are capable of owning a piece of non trivial code, especially if they didn't write it.
Realiy is you don't solve these problems by to relying on everyone to be perfect - everyone slips up - to achieve results consistently you need process/systems to assure quality.
Safety critical system should be even better equipped to adopt this because they already have the systems to promote correct outputs.
The problem is those systems weren't built for LLMs specifically so the unexpected failure cases and the volume might not be a perfect fit - but then you work on adapting the quality control system.
>>replace LLM with employee in your argument - what changes ?
I mentioned this part in my comment. You cannot trust an automated process to a thing, and expect the same process to verify if it did it right. This is with regards to any automated process, not just code.
This is not the same as manufacturing, as in manufacturing you make the same part thousands of times. In code the automated process makes a specific customised thing only once, and it has to be right.
>>The problem is those systems weren't built for LLMs specifically so the unexpected failure cases ...
We are not talking of failures. There is a space between success and failure where the LLM can go into easily.
That's not what I get out of the comment you are replying to.
In the case being discussed here, one of code matching the tax code, perfection is likely possible; perfection is defined by the tax code. The SME on this should be writing the tests that demonstrate adhering with the tax code. Once they do that, then it doesn't matter if they, or the AI, or a one shot consultant write it, as far as correctness goes.
If the resulting AI code has subtle bugs in it that pass the test, the SME likely didn't understand the corner cases of this part of the tax code as well as they thought, and quite possibly could have run into the same bugs.
That's what I get out of what you are replying to.
With handwritten code, the humans know what they don’t know. If you want some constants or some formula, you don’t invent or guess it, you ask the domain expert.
Let's put it this way: the human author is capable of doing so. The LLM is not. You can cultivate the human to learn to think in this way. You can for a brief period coerce an LLM to do so.
Humans make such mistakes slowly. It's much harder to catch the "drift" introduced by LLM because it happens so quickly and silently. By the time you notice something is wrong, it has already become the foundation for more code. You are then looking at a full rewrite.
The rate of the mistakes versus the rate of consumers and testers finding them was a ratio we could deal with and we don’t have the facilities to deal with the new ratio.
It is likely over time that AI code will necessitate the use of more elaborate canary systems that increase the cost per feature quite considerably. Particularly for small and mid sized orgs where those costs are difficult to amortize.
How is that different from handwritten code ? Sounds like stuff you deal with architecturally (auditable/with review/rollback) and with tests.