I've released a templatized local development setup using devcontainers that I've crafted over the last year, that I use on all projects now. This post explains the why and links to the project.
It's potentially the opposite. If you instrument a codebase with documentation and configuration for AI agents to work well in it, then in a year, that agent will be able to do that same work just as well (or better with model progress) at adding new features.
This assumes your adding documentation, tests, instructions, and other scaffolding along the way, of course.
I wonder how soon (or if it's already happening) that AI coding tools will behave like early career developers who claim all the existing code written by others is crap and go on to convince management that a ground up rewrite is required.
(And now I'm wondering how soon the standard AI-first response to bug reports will be a complete rewrite by AI using the previous prompts plus the new bug report? Are people already working on CI/CD systems that replace the CI part with whole-project AI rewrites?)
As the cost of AI-generated code approaches zero (both in time and money), I see nothing wrong with letting the AI agent spin up a dev environment and take its best shot. If it can prove with rigorous testing that the new code works is at least as reliable as the old code, and is written better, then it's a win/win. If not, delete that agent and move on.
On the other hand, if the agent is just as capable of fixing bugs in legacy code as rewriting it, and humans are no longer in the loop, who cares if it's legacy code?
But I can see it "working". At least for the values of "working" that would be "good enough" for a large portion of the production code I've written or overseen in my 30+ year career.
Some code pretty much outlasts all expectations because it just works. I had a Perl script I wrote in around 1995-1998 that ran from cron and sent email to my personal account. I quit that job, but the server running it got migrated to virtual machines and didn't stop sending me email until about 2017 - at least three sales or corporate takeovers later (It was _probably_ running on CentOS4 when I last touched it in around 2005, I'd love to know if it was just turned into a VM and running as part of critical infrastructure on CentOS4 12 years later).
But most code only lasts as long as the idea or the money or the people behind the idea last - all the website and differently skinned CRUD apps I built or managed rarely lasted 5 years without being either shut down or rewritten from the ground up by new developers or leadership in whatever the Resume Driven Development language or framework was at the time - toss out the Perl and rewrite it in Python, toss out the Python and rewrite it in Ruby On Rails, then decide we need Enterprise Java to post about on LinkedIn, then rewrite that in Nodejs, now toss out the Node and use Go or Rust. I'm reasonably sure this year's or perhaps next years LLM coding tools can do a better job of those rewrites than the people who actually did them...
Will the cost of AI-generated code approach zero? I thought the hardware and electricity needed to power and train the models and infer was huge and only growing. Today the free and plus plans might be only $20/month, once moats are built I assume prices will skyrocket a order of magnitude or few higher.
> Will the cost of AI-generated code approach zero?
Absolutely not.
In the short term it will, while OpenAI/Anthropic/Anysphere destroy software development as a career. But they're just running the Uber playbook - right now they're giving away VC money by funding the datacenters that're training and running the LLMs. As soon as they've put enough developers out of jobs and ensured there's no new pipeline of developers capable of writing code and building platforms without AI assistance, they will stop burning VC cash and start charging at rates that not only break even but also return the 100x the investors demand.
They're not directly solving the same problem. MCP is for exposing tools, such as reading files. a2a is for agents to talk to other agents to collaborate.
MCP servers can expose tools that are agents, but don't have to, and usually don't.
That being said, I can't say I've come across an actual implementation of a2a outside of press releases...
Perhaps naive to say, but I think there was the briefest moment where your status updates started with "is", feeds were chronological, and photos and links weren't pushed over text, that it was not an adversarial actor to one's wellbeing.
There was an even briefer moment where there was no such thing as status updates. You didn't have a "wall." The point wasn't to post about your own life. You could go leave public messages on other people's profiles. And you could poke them. And that was about it.
I remember complaining like hell when the wall came out, that it was the beginning of the end. But this was before publicly recording your own thoughts somewhere everyone could see was commonplace, so I did it by messaging my friends on AIM.
And then when the Feed came out? It was received as creepy and stalkerish. And there are now (young) adults born in the time since who can't even fathom a world without ubiquitous feeds in your pocket.
Unless I’m remembering wrong, posting a public message on someone else’s profile was posting on their wall. Or was it called something else before it was somebody’s wall?
It didn't have a name. It wasn't really a "feature." You just went and posted on their "page" I guess I would call it.
The change to being able to post things on your own page and expecting other people to come to your page and read them (because, again, no Feed) wasn't received well at first.
Keep in mind, smartphones didn't exist yet, and the first ones didn't have selfie cameras even once they did. And the cameras on flip phones were mostly garbage, so if you wanted to show a picture, you had to bring a camera with you, plug it in, and upload it. So at first the Wall basically replaced AIM away messages so you could tell your friends which library you were going to go study in and how long. And this didn't seem problematic, because you were probably only friends with people in your school (it was only open to university students, and not many schools at first), and nobody was mining your data, because there were no business or entity pages.
Yeah, that's about when it changed. The lack of a wall was a very early situation. I joined in 2004, back when it was only open to Ivy League and Boston-area schools.
It was still acceptable to write on someone else's wall when they came to be called that. You can still do that now I think but it's quite uncommon and how it works is now complicated but settings.
Sure, you could. That wasn't the problem. The problem was that now you could post on your own.
That's what turned it from a method of reaching out and sending messages to specific people when you had something to say to them to a means of shouting into the void and expecting (or at least hoping) that someone, somewhere, would see it and care what you had to say. It went from something actively pro-social to something self-focused.
Blogs and other self-focused things already existed, but almost nobody used them for small updates throughout the day. Why do you think the early joke about Twitter was that it was just a bunch of self-absorbed people posting pictures of their lunch? Nobody knew what to do with a tool like that yet, but the creation of that kind of tool has led to an intensity of self-focus and obsession the world had never seen before.
I made the mistake of sending a Gen Z (adult) friend a poking finger emoji to try to remind him about something.
It wasn't the first time I've had a generational digital (ha) communication failure, but it was the first time I've had one because I'm old and out of touch with what things mean these days!
My hunch is that instant messaging is slowly taking over that space. If you actually want to connect with people you can without needing much of a platform.
I mean let's be clear on the history and not romanticize anything, Zuck created Facebook pretty much so he could spy on college girls. He denies this of course, but it all started with his Facemash site for ranking the girls, and then we get to the early Facebook era and there's his quote about the "4,000 dumbfucks trusting him with their photos" etc.
There is no benevolent original version of FB. It was a toy made by a college nerd who wanted to siphon data about chicks. It was more user friendly back then because he didn't have a monopoly yet. Now it has expanded to siphoning data from the entire human race and because they're powerful they can be bigger bullies about it. Zuck has kind of indirectly apologized for being a creeper during his college years. But the behavior of his company hasn't changed.
After converting many of my projects, and helping a couple startups tool their codebases and teams for using AI agents better, these 5 things are what I now do on every codebase I work on.
There’s a really common cognitive fallacy of “the consequences of that are something I don’t like, therefore it’s wrong”.
It’s like reductio ad absurdum, but without the logical consequence of the argument being incorrect, just bad.
You see it all the time, especially when it comes to predictions. The whole point of this article is coding agents are powerful and the arguments against this are generally weak and ill-informed. Coding agents having a negative impact on skill growth of new developers isn’t a “fundamental mistake” at all.
What I’ve been saying to my friends for the last couple of months has been, that we’re not going to see coding jobs go away, but we’re going to run into a situation where it’s harder to grow junior engineers into senior engineers because the LLMs will be doing all the work of figuring out why it isn’t working.
This will IMO lead to a “COBOL problem” where there are a shortage of people with truly deep understanding of how it all fits together and who can figure out the line of code to tweak to fix that ops problem that’s causing your production outage.
I’m not arguing for or against LLMs, just trying to look down the road to consequences. Agentic coding is going to become a daily part of every developer’s workflow; by next year it will be table stakes - as the article said, if you’re not already doing it, you’re standing still: if you’re a 10x developer now, you’ll be a 0.8x developer next year, and if you’re a 1x developer now, without agentic coding you’ll be a 0.1x developer.
It’s not hype; it’s just recognition of the dramatic increase in productivity that is happening right now.
I'd think this would work, as the 4 tools listed are about retrieving information to give agents more context of correct providers and modules. Given terragrunt works with terraform directly, I'd think that it would help with it as well, just add rules/prompts that are explicit about the code being generated being in terra grunt file structure / with terragrunt commands.
Thanks, this is helpful. I tried Claude Code, and thought it had a lot of potential, but I was on track to spend at least $20/day.
For a tool that radically increases productivity (say 2x), I think it could still make sense for a VC funded startup or an established company (even $100/day or $36k/year is still a lot less than hiring another developer). But for a side project or bootstrap effort, $36k/year obviously significantly increases cash expenses. $100/month does not, however.
So, I'm going to go back and upgrade to Max and try it again. If that keeps my costs to $100/month, thats a really different value proposition.
Can you clarify what you mean here? Are you saying I can use Claude Code for a flat rate of $100/month? What are the limits? What if I use more than $100 worth of Code in a month? Their website doesn't seem to make it clear.
Edit:
Found the answer to my own questions
> Send approximately 50-200 prompts with Claude Code every 5 hours[1]
Really tempted to go for this as well. Only wish I could access flat rate Claude through VS Code Cline (or an extension like it) as well - that would be the complete package. $100 / month + ~$$ / day in API credits is gonna get pricey.
The way Claude Code is going is exactly what I want out of a agentic coding tool with this "unix toolish" philosophy. I've been using Claude code since the initial public preview release, and have seen the direction over time.
The "golden" end state of coding agents is that you give it a Feature Request (EG Jira ticket), and it gives you a PR to review and give feedback on. Cursor, windsurf, etc, are dead ends in that sense as they are local editors, and can not be in CI.
If you are tooling your codebase for optimal AI usage (Rules, MCP, etc), you should target a technology that can bridge the gap to headless usage. The fact Claude Code can trivially be used as part of automation through the tools means it's now the default way I thinking about coding agents (Codex, the npm package, is the same).
Disclaimer, I focus on helping companies tool their codebases for optimal agent usage, so I might have a bias here to easily configurable tools.
Not sure about that golden end state. Mine would be being in a room surround by screens with AI agents coding, designing, testing, etc. I would be there in the center giving guidance, direction, applying taste, etc…
All conversational, wouldn’t need to touch the keyboard 99% of the time.
I hate using voice for anything. I hate getting voice messages, I hate creating them. I get cold sweats just thinking about having to direct 10 AI Agents via voice. Just give me a keyboard and a bunch of screens, thanks.
I'm a millennial. I refuse to use voice controls. Never used them in my life and hope I never have to. There's a block in my brain that just refuses to let me talk to a machine to give it orders.
Though I'll gladly call it various foul names when it's refusing to do what I expected it to do.
My jaw hurts after an hour long meeting. I lose my voice after 2 hours. Can’t say I’ve ever noticed finger fatigue, even after 16 hours of typing and playing guitar.
Yeah, I think I’d rather click and type than talk, all day.
Probably worth trying one of the many dictation apps out there based on whisper. They can get most coding terms(lib names, tech stack names) accurately and its one of those things you have to really try for a week before dismissing fully.
Some of us who’ve been in this game for a while consider having healthy hands to be a nice break between episodes of RSI, PT, etc. YMMV of course but your muscle stamina won’t be the problem, it’s your tendons and eventually your joints.
How many of you people having problems with hand health vis a vis typing are still using home row?
I've done more typing than speaking for over 40 years now, and I've never had any carpel tunnel or joint problems with my hands (my feet on the other hand.. hoo boy!) and I've always used a standard layout flat QWERTY keyboard.. but I never bend my hands into that unnatural "home row" position.
I type >60wpm using what 40 years ago was "hunt and peck" and evolved over brute force usage into "my hands know where they keys are, I am right handed so my right hand monopolizes 2/3 of the keyboard, both hands know where every key is so either one can take over the keyboard if the other is unavailable (holding food, holding microphone for when I do do voice work, using mouse, etc)".
But as a result my hands also evolved this bespoke typing strategy which naturally avoids uncomfortable poses and uncomfortable repetition.
I'd wager that probably covers only ~30% of the world population, and considering that people who speak Mandarin for example use other apps, it probably covers an even larger slice of the Whatsapp userbase.
I’m the same. I love that writing allows you to think while typing so that you can review and revise your thoughts before letting them out in the world.
And don’t get me started on video vs text for learning purely non-physical stuff like programming…
I'm another millennial that doesn't like them. I type pretty fast, around 100 WPM, so outside environments where I can't type (e.g. while driving), I just never saw the appeal. Typing has a way of helping me shape my thoughts precisely that I couldn't replicate with first thinking about what I want to say, and then saying it precisely.
But I can appreciate that sitting down in front of a keyboard and going at it with low typing speed seems unnatural and frustrating for probably the majority of people. To me, in front of a keyboard is a fairly natural state. Somebody growing up 15 years before (got by without PCs in their early years) or after me (got by with a smartphone) probably doesn't find it as natural.
It's practice... Consciously try using the voice input for a while and see how you feel after a few days. I ended up liking it for some things more than others. This is typed via voice with minor edits after. This relies on the new models though - the older systems just didn't work as well.
I've consciously tried doing this for the past month on Android when chatting to Claude... when I'm alone. Don't think I could ever feel comfortable doing it around people.
I think I'm marginally faster using speech to text than using a predictive text touch keyboard.
But it makes enough mistakes that it's only very slightly faster, and I have a very mild accent. I expect for anyone with a strong accent it's a non starter.
On a real keyboard where I can touch type, it's much slower to use voice. The tooling will have to improve massively before it's going to be better to work by speaking to a laptop.
Voicemail universally sucks. However, when you're having a synchronous conversation with actual people, do you prefer to do everything via IM, or would you prefer a phone call?
Email. Async comms make sense 99% of the time at my job. Unless there's deep work to be done, or pie-in-the-sky idea fabricating. Or rubber-ducky sessions. But I won't do those with AI.
Email is Calm Technology[0] for collaborative knowledge work, where you expected to spend hours on a single task. If something needs brainstorming, or quick back and forth, you jump on a more synchronous type of conversation (IM, call, in person meeting).
I almost never prefer a phone call, I'd rather go all the way to video/in-person or stick with text. I also prefer to push anything important that isn't extremely small out of instant messaging and to email.
Brainstorming/whiteboarding, 1:1s or performance feedback, team socialization, working through something very difficult (e.g. pair debugging): in-person or video
Incidents, asking for quick help/pointers, small quick questions, social groups, intra-team updates: IM
Bigger design documents and their feedback, trickier questions or debugging that isn't urgent, sharing cool/interesting things, inter-team updates: Email
> do you prefer to do everything via IM, or would you prefer a phone call?
It's hard for me to believe that there are psychopaths among us who prefer call on the phone, slack huddle or even organize meetings instead of just calmly writing messages on IM over coffee.
Yes this is known etiquette eg in China where voice memos are widely used on WeChat.
Sending a voice memo is slightly rude for business as it says I the sender value my time to dash something off more even if it’s inconvenient for you the receiver who has to then stop and listen to it.
Between friends is a bit different as voice has a level of personal warmth.
I would agree but i use voice heavily with AI agents and here is why: no matter how fast i can type, i can speak much faster, and while i do other tasks.
One advantage is speaking is generally faster than typing. Imagine instead of talking to a bunch of AI you’re talking to a room full of coworkers about the architecture to develop.
If that’s the future, that means a massive reduction in software engineers no? What you are describing would require one technical product manager, not a team of software engineers.
I would guess it's most likely both. The world could use a lot more software but it's not an unlimited appetite and the increase in productivity of SWEs will depress wages.
How many places have you worked where there's no backlog in Jira and the engineers legitimately have nothing to do other than sit around waiting for work to get assigned ‽
Define everyone. I know a lot of SWEs who don't take their job for granted, always strive to add value, and try to keep skilled constantly and try to be extremely helpful. Maybe in SV where the salaries are high there is some schadenfreude but I don't see that on general for what is a worldwide industry. In most places it's just a standard job.
I don't understand the pleasure of putting people out of work and the pain on people's lives and careers but I guess that's just me.
Except that AI agents are the new offshoring. The new hotshot developer will be someone who understands what clients want deeply, knows the domain, has sufficient engineering skill to understand the system that needs to be built and is able to guide swarms of coding agents efficiently.
Having all this in one person is super valuable because you lose a lot of speed and fidelity in information exchange between brains. I wouldn't be surprised if someone could hit like 30-50 kloc/day within a few years. I can hit 5-10kloc/day doing this stuff depending on a lot of factors, and that's driving ~2 agents at a time mostly. Imagine driving 20.
You can't just be a solution architect, you have to be a systems architect, which is sort of the culmination of the developer skillset. I don't write code anymore really, but I know the purpose of everything my agents are doing and when they're making mistakes. I also have to know the domain, and be able to interact with clients, but without the technical chops I wouldn't be able to deliver on the level that I do.
How hard do you really think the job of “technical product manager” is? I'm not asking in a childish "management doesn't do anything" sort of way, but want to frame the question "if software engineers needed to retrain to be technical product managers, how many would sink, and how many would swim?
I can easily see this happening in 2-3 years. Some chat apps already have outstanding voice mode, such as GPT-4o. It's just a matter of integrating that voice mode, and getting the understanding and generated code to be /slightly/ better than it is today.
It seems unlikely that any one individual would be able to output a sufficient amount of context for that to not go off the rails really quickly (or just be extremely inefficient as most agents sit idle waiting for verification of their work)
No. The "golden" end state of coding agents is free and open source coding agents running on my machine (or in whatever machine I want). Can you imagine paying for every command you run in your terminal? For every `ls`, `ps`, `kill`? No sense, right? Well, same for LLMs.
I'm not saying "ban propietary LLMs", I'm saying: hackers (the ones that used to read sites like this) should have as their main tools free and open source ones.
> Can you imagine paying for every command you run in your terminal?
Yes, because hardware and electricity aren't free.
I literally DO pay for every command. I just don't get an itemized bill so there's no transparency about it. Instead, I made some lump-sum hardware payment which is amortized over the total usage I get out of it, plus some marginal increase in my monthly electric bill when I use it.
Sure but the same thing would apply to the original comment, only that it's a locally hosted LLM that you're buying electricity for. That's different than paying rent for the privilege of using those commands and being at the mercy of the providers who choose to modify or EOL those commands as they see fit.
I agree with the sentiment, but isn’t Claude Code (the CLI) FOSS already? (Not sure it’s coupled to Claude the model API either, but if it is I imagine it’s not too hard to fix.)
> Cursor, windsurf, etc, are dead ends in that sense as they are local editors, and can not be in CI.
I was doing this with Cursor and MCPs. Got about a full day of this before I was rate limited and dropped to the slowest, dumbest model. I’ve done it with Claude too and quickly exhaust my rate limits. And the PRs are only “good to go” about 25% of the time, and it’s often faster to just do it right than find out where the AI screwed up.
> The "golden" end state of coding agents is that you give it a Feature Request (EG Jira ticket), and it gives you a PR to review and give feedback on.
I see your point but in the other hand how depressing to be left only with the most soul crushing part of software entering - the Jira ticket.
I personally find figuring out what the product should be is the fun part. There still a need for architecting a plan, but the actual act of writing code isn't what gives me personal joy, it's the building of something new.
I understand the craft of code itself is what some people love though!
Thing is, LLMs are already better than people at the "architecting a plan" and "figuring out what the product should be" in details that go beyond high-level vibes. They do that even better than raw coding.
In fact, that's the main reason I like developing quick prototypes and small projects with LLMs. I use them less to write code for me, and more to cut through the bullshit "research" phase of figuring out what code to write, which libraries to pick, what steps and auxiliary work I'm missing in my concept, etc.
They’re great if word count is your measure. But it’s hard for LLMs to know the whole current SOTA and come up with something innovative and insightful. The same as 99% of human proposals.
Can LLMs come up with the 1% ideas that breakthrough?
Paired with great execution
LLMs definitely know more of the current SOTA in everything than anyone alive, and that doesn't even count in the generous amount of searching capability granted to them by vendors. They may fail to utilize results fully due to limited reasoning ability, but they more than make up for it in volume.
> Can LLMs come up with the 1% ideas that breakthrough? Paired with great execution
It's more like 0.01%, and it's not the target anyway. The world doesn't run on breakthroughs and great execution, it runs on the 99.99% of the so-so work and incremental refinement.
Say what you will, but this would have the wonderful side effect of forcing people who write JIRA tickets to actually think through and clearly express what it is they want built.
The moment I am able to outsource work for Jira tickets to a level that AI actually delivers a reasonable pull request, many corporate managers will seriously wonder why keep the offshoring team around.
It seems like the Holy Grail here has become: "A business is one person, the CEO, sitting at his desk doing deals and directing virtual and physical agents to do accounting, run factories, manage R&D, run marketing campaigns, everything." That's it. A single CEO, (maybe) a lawyer, and a big AI/robotics bill = every business. No pesky employees to pay. That's the ultimate end game here, that's what these guys want. Is that what we want?
Keep going, the end end goal is that even the customers are AI. And the company doesn't sell anything or do anything, it just trades NFTs and stocks and digital goods. And the money isn't real, it's all crypto. This is the ideal, to create nothing, to sell nothing to no one, and for somehow that to mean you created "value" to society and therefore should be rewarded in material terms. And greatly at that, the people setting all this up expect to be at the tippy top of the social ladder for this "contribution".
This is I guess what happens when you follow capitalism to its logical conclusion. It's exactly what you expect from some reinforcement learning algorithm that only knows how to climb a gradient to maximize a singular reward. The concept of commerce has become the proverbial rat in the skinner box. It has figured out how to mainline the heroin drip if it just holds down the shock button and rewires its brain to get off on the pain. Sure it's an artificial high and hurts like hell to achieve it, but what else is there to live for? We made the line going up mean everything, so that's all that matters now. Doesn't matter if we don't want it, they want it. So that's what it's going to be.
The owner (human) would say "build a company, make me a billion dollars" and that would be the only valuable input needed from him/her. Everything else would be derived & executed by the AI swarm, while owner plays video games (or generally enjoy the product of other people's AI-labor) 100% of the time.
I'd argue GPT4 (2022) was already AGI. It could output anything you (or Tim Cook, or any other smart guy) could possibly output given the relevant context. The reason it doesn't right now is we are not passing in all your life's context. If we achieve this, a human CEO has no edge over an AI CEO.
People are figuring this problem out very quickly, therefore the explosion of agentic capabilities happening right now even though the base model fundamentally does the same stuff as GPT4.
Of all the professions that are at the risk of being downsized, I think lawyers are up there. We used to consult our lawyers so frequently about things big and small. We have now completely removed the small stuff from that equation. And most of our stuff is small. There is very little of the big stuff and I think LLMs aren't too far from taking care of that as well.
Yup I have said for the past year to anyone that'll listen, that the concept of hourly (white collar) work will go away.
And there's no better example of hourly work than lawyers.
Personally, I've always disliked the model of billing by the hour because it incentivizes the wrong things, but it is easier to get clients to justify these costs (because they're used to thinking in that framework).
I'd rather take on the risk and find ways to do more efficient work. It's actually FUN to do things that way. And nowadays, this is where AI can benefit in that framework the most.
So far, automation has only ever increased the need for software development. Jevons Paradox plus the recursive nature of software means that there's always more stuff to do.
The real threats to our profession are things like climate change, extreme wealth concentration, political instability, cultural regression and so on. It's the stuff that software stands on that one should worry about, not the stuff that it builds towards.
Maybe I’m not think big picture enough… but have you ever tried using generative AI (i.e., a transformer) to create a circuit schematic? They fail miserably. Worse than Chat GPT-2 at generating text.
The current SOTA models can do some impressive things, in certain domains. But running a business is way more than generating JavaScript.
The way I see it, only some jobs will be impacted by generative AI in the near term. Not replaced, augmented.
Because of human factors, no complains, can do overtime as much as electricity is on, no unions, and everything else that a good CEO to the whims of exponential growth for their shareholder likes to do so much.
Aider is definitely in the same camp. Last time I checked, they weren't optimizing for the full "agent infinitely looping until completion" usecase, and didn't have MCP support.
But it's 100% the same class of tool and the awesome part of the unixy model is hopefully agents can be substituted in for each other in your pipeline for whichever one is better for the usecase, just like models are interoperable.
I tried aider today with a Gemini API key and billing account. It’s not close to the experience I have with Claude Code on Saturday which was able to implement a full feature.
The main difference is I interact with Claude Code only through conversation. Aider felt much more like I was talking to two different tools, the model and Aider. For example, constantly having to add files and parse the less than ideal console output compared to how Claude code handles user feedback.
"Aider felt much more like I was talking to two different tools"
I personally see that as a plus, because other tools are lacking on the tool side. Aider seems to have solid "traditional" engineering behind its tooling.
"constantly having to add files"
That's fair. However, Aider automatically adds files that trigger it via comments and it asks to add the files that are mentioned in the conversation.
"parse the less than ideal console output"
That's fair too. Still, the models aren't there yet, so I value tools that don't hide the potential crap that thee models produce 20-30% of the time.
The vision of submitting a feature request and receiving a ready-to-review PR is equally compelling and horrifying from the standpoint of strategy management.
Like Anthropic and most big tech companies, they don't want to show off the best until they need to. They used to stockpile some cool features, and they have time to think about their strategy. But now I feel like they are in a rush to show off everything and I'm worried whether the management has time to think about the big picture.
Setting aside predictions about the future and what is best for humanity and all that for a moment this is just such a bummer on a personal level. My whole job would become the worst parts of my job.
(please pardon the self-promotion) This is exactly what my product https://cheepcode.com does (connects to your Linear/Jira/etc and submits PRs to GitHub) - I agree that’s the golden state, and that’s why I’m rushing to get out of private beta as fast as I can* :) It’s a bootstrapped operation right now which limits my speed a bit but this is the vision I’ve been working towards for the past few months.
*I have a few more safety/scalability changes to make but expecting public launch in a few weeks!
> The "golden" end state of coding agents is that you give it a Feature Request (EG Jira ticket), and it gives you a PR to review and give feedback on. Cursor, windsurf, etc, are dead ends in that sense as they are local editors, and can not be in CI.
Isn’t that effectively the promise of the most recently released OpenAI codex?
From the reviews I’ve been able to find so far though, quality of output is ehh.
played around with connecting https://github.com/eyaltoledano/claude-task-master via mcp to create a prd which basically replaces the ticket grooming process and then executing it with claude code creating a branch named like the ticket and pushing after having created the unit tests and constant linting.
I'm about 50kloc into a project making a react native app / golang backend for recipes with grocery lists, collaborative editing, household sharing, so a complex data model and runtime. Purely from the experiment of "what's it like to build with AI, no lines of code directly written, just directing the AI."
As I go through features, I'm comparing a matrix of Cursor, Cline, and Roo, with the various models.
While I'm still working on the final product, there's no doubt to me that Sonnet is the only model that works with these tools well enough to be Agentic (rather than single file work).
I'm really excited to now compare this 3.7 release and how good it is at avoiding some of the traps 3.5 can fall into.
"no lines of code directly written, just directing the AI"
/skeptical face.
Without fail, every. single. person. I've met who says that, actually means "except for the code that I write", or "except for how I link the code it build together by hand".
If you are 50kloc in to a large complex project that you have literally written none of, and have, eg. used cursor to generate the code without any assistance... well, you should start a startup.
...because, that's what devin was supposed to be, and it was enormously and famously terrible at it.
So that would be either a) terribly exciting, or b) hyperbole.
I’m currently doing something very similar to what GP is doing - I’m building a hobby project that’s a desktop app with web frontend. It’s a map editor with a 3D view. My estimate is that 80-90% of the code was written by AI. Sure, I did have to intervene or write some more complex parts myself but it’s still exciting to me that in many cases it took just a single prompt to add a new feature to it or change existing behavior. Judging from the complexity of the project it would take me in the past 4-5x longer if I were to write it completely by hand. It’s a game changer for me.
> My estimate is that 80-90% of the code was written by AI
Nice! It is entirely reasonable both to do that and to be excited about it.
…buuut, if that’s what you’re doing, you should say so.
Not:
“no lines of code directly written, just directing the AI”
Because those (gluing together AI code by hand and having the agent do everything) are different things, and one of them is much much MUCH harder to get right than the other one.
That last 10-15%. Self driving cars are the same story right?
I don’t think this is a fair take. For self driving cars, you care about that because safety is involved and the reliability of the AI is the product itself.
For OP, the product is the product, how they got there is mostly irrelevant. We don’t really care what IDE they used (outside of being a tooling nerd).
AI is hard; edge cases are hard. AI sucks at edge cases.
Between AI for cars and AI for software the long tail of edge cases that have to be catered for is different, yes.
...but I'm sure the same will apply for AI for art (e.g. hands), and AI for (insert domain here).
Obviously no analogy is perfect, but I think you have to really make an effort to look away from reality not to see the glaringly obvious parallels in cars, art, programming, problem solving, robots, etc. where machine learning models struggle with edge cases.
Does the tooling they used matter? no, not at all.
...but if they've claimed to solve the 'edge case problem', they've done something really interesting. If not, they haven't.
So, don't claim to have done something really interesting if you haven't.
You can say "I've been using AI to build a blah blah blah. It's great!" and that's perfectly ok.
You have to go out of your way to say "I've been using an AI to build blah blah blah and I haven't written any of it, it's all generated by AI". <-- kinda attention seeking.
"no lines of code directly written" really? Why did you mention that? You got the AI to write your software for you? That sounds cool! Let's talk! Are you an AI consultant by any chance? (yes, they are). ...but.
No. You didn't. You really didn't. I'm completely happy to call people out for doing that; its not unfair at all.
That's the point of the experiment I'm doing, what it takes to get these things to be able to generate all the code, and I'm just directing.
I literally have not written a line of code. The AI agent configures the build systems. It executes the `go install` command. It configures the infrastructure via terraform.
It takes a lot of reading of the code that's generated to see what I agree with or not, and redirecting refactorings. Understanding how to describe problem statements that are translated into design docs that are translated into task lists. It's still a lot of knowledge work on how to build software. But now I can do the coding that might have taken a day from those plans in 20 minutes.
Regarding startups, there's nothing here I'm doing that isn't just learning the tools of agentic coding. The business here might be advising people on how to do it themselves.
If you know how to architect code well, you can guide the AI to create smaller more targeted modules. That way as you 'write code with AI', you give it a targeted subset of the files to edit on each prompt.
In a way the AI becomes the dev and you become the code reviewer. Often as the AI is writing the code, you're thinking about the next step.