Add "Complete this request as a single task and do not ask any follow-up questions." Or some variation of that. They keep screwing with default behavior, but you can explicitly direct the LLM to override it.
What I take from this is that LLMs are somewhat miraculous in generation but terrible at revision. Especially with images, they are very resistant to adjusting initial approaches.
I wonder if there is a consistent way to force structural revisions. I have found Nano Banana particularly terrible at revisions, even something like "change the image dimensions to..." it will confidently claim success but do nothing.
A thing I've been noticing across the board is that current generative AI systems are horrible at composition. It’s most obvious in image generation models where the composition and blocking tend to be jarringly simple and on point (hyper-symmetry, all-middleground, or one of like three canned "artistic" compositions) no matter how you prompt them, but you see it in things like text output as well once you notice it.
I suspect this is either a training data issue, or an issue with the people building these things not recognizing the problem, but it's weird how persistent and cross-model the issue is, even in model releases that specifically call out better/more steerable composition behavior.
I almost always get better results from LLMs by going back and editing my prompt and starting again, rather than trying to correct/guide it interactively. Almost as if having mistakes in your context window is an instruction to generate more mistakes! (I'm sure it's not quite that simple)
I see this all the time when asking Claude or ChapGPT to produce a single-page two-column PDF summarizing the conclusions of our chat. Literally 99% of the time I get a multi-page unpredictably-formatted mess, even after gently asking over and over for specific fixes to the formatting mistake/s.
And as you say, they cheerfully assert that they've done the job, for real this time, every time.
Ask for the asciidoc and asciidoctor command to make a PDF instead. Chat bots aren’t designed to make PDFs. They are just trying to use tools in the background, probably starting with markdown.
Tools are still evolving out of the VLM/LLM split [0]. The reason image-to-image tasks are so variable in quality and vastly inferior to text-to-image tasks is because there is an entirely separate model that is trained on transforming an input image into tokens in the LLM's vector space.
The naive approach that gets you results like ChatGPT is to produce output tokens based on the prompt and generate a new image from the output. It is really difficult to maintain details from the input image with this approach.
A more advanced approach is to generate a stream of "edits" to the input image instead. You see this with Gemini, which sometimes maintains original image details to a fault; e.g. it will preserve human faces at all cost, probably as a result of training.
I think the round-trip through SVG is an extreme challenge to train through and essentially forces the LLM to progressively edit the SVG source, which can result in something like the Gemini approach above.
Revision should be much easier than generation, e.g. reflection style CoT (draft-critique-revision) is typically the simplest way to get things done with these models. It's always possible to overthink, though.
Nano Banana is rather terrible at multi-turn chats, just like any other model, despite the claim it's been trained for it. Scattered context and irrelevant distractors are always bad, compressing the conversation into a single turn fixes this.
I’m not quite sure.
I think that adversarial network works pretty well at image generation.
I think that the problem here is that svg is structured information and an image is unstructured blob, and the translation between them requires planning and understanding. Maybe if instead of treating an svg like a raster image in the prompt is wrong. I think that prompting the image like code (which svg basically is) would result in better outputs.
The prompt just said to iterate until they were satisfied. Adding something like "don't be afraid to change your approach or make significant revisions" would probably give different results.
As an engineer and artist, I think a better comparison is painting -> photography. It took quite a while for photography to be considered an art, since it removed so much of the creative control from the artist. But it replaced them with new and different skills, particularly the value of curation.
Some skills, like framing, values, balance, etc. become even more important differentiators. Yes, it is much different. But as long as humans are in the loop, there is an opportunity for human communication.
>Some skills, like framing, values, balance, etc. become even more important differentiators.
I agree. I think many artists in the future will be closer to directors/cinematographers/editors than performers
Many of the skills artists have today will still be necessary and transferable, but what will separate the good artists from the bad artists will be their ability to communicate their ideas to agents / other humans
Same with software developers I suspect - communication will be the most important skill of all, and the rockstar loner devs who don't work well in teams will slowly phase out
I thought this was obvious? These are social games where everyone is in the same funnel and the players with the most time dominate others... but also need new objectives. At the beginning you quest with people your level, but they always, always devolve into bigger, more tedious tasks (raids) that have less and less differentiated rewards (1% chance of a drop that boosts you 2%) because otherwise you have players at level 283 and there is no way to balance team dynamics as some people scale infinitely.
I thought this was fairly well proven at this point. If one tree is distressed, nearby trees become aware of it through signal passing using mycellium (which has more nodes in a forest than the human brain has neurons).
Fungi are deeply alien life. Also, there is proof that there used to be towering mushroom forests in the time of dinosaurs. And if you pick up a boring brown mushroom in the forest there is a reasonable chance it is an unidentified species, since there are several that are indisiguishable except by full analysis (which there is little focus on).
• Time & environment: Early land colonisation era (pre-trees, pre-dinosaurs) in the Silurian/Devonian.
• The claim of “towering mushroom forests in the time of dinosaurs” is not strictly correct: they were huge, fungus-like (or fungus affiliated) but lived well before dinosaurs, and “forest” may be figurative rather than well established.
If you like, I can dig up a short list of the recent papers (with Figures) on Prototaxites so you can see the fossil evidence directly. Would that be helpful, Rob Mpucee?
===
That’s a wild answer lol. Although it technically did answer the question.
Was it Snow Crash or Diamond Age (or something earlier?) that had mushrooms as the basis for advanced technology? I'm curious if there was actual insight there or a happy coincidence.
It wasn't quite fungal, but Diamond Age had the Seed. The Seed grew any material/structure/device desired as opposed to the Matter Compilers and the Feed. We know that the Seed required tons of computing power to design and create, but the exact method of function of the Seed wasn't detailed. We do, however, know that it may have been nanotechnological as Nell had nanotechnology that her Mouse Army created to counter the nanites in the Drummers.
The Stone Sky series by NK Jemisin referenced the use of genetically modified fungi as self-assembling construction material. Really cool to see how pervasive mushrooms are in sci-fi and how there's lots more potential to unlock.
I'm curious if their near-term expectation is that this is be better than these models or is this a model they tend to use in Auto mode, or if the focus is really if you want speed...? I guess my question is why would I actively chose this over Auto?
This is just a completely different use of LLMs and has little to do with working at a real business with a live site and users. Cursor is great when you want to gain understanding of an issue quickly, or resolve something clear and specific quickly.
I'm not against YOLO vibe coding, but being against tab completion is just insane to me. At the end of the day, LLMs help you achieve goals quicker. You still need to know what goal you want to achieve, and tab completion basically let's me complete a focused goal nearly as soon as I determine what my goal is.
No, Claude did not introduce the bugs. I caused the bugs, years ago, and didn't have time to pursue the project for a long time. Claude fixed them by being handed unfinished, broken code and a test suite and told to make the tests pass.