Hacker News new | past | comments | ask | show | jobs | submit login

> So, this is going to have new different issues.

Well, yeah, its a bigger set of models (particular the language model) that takes more resources (both to train and for inference.) That’s the tradeoff.

> Here’s my question: are there any image models where, if you prompt “1+1”, you get an image showing “3”?

You want a t2i model that does arithmetic in the prompt, translates to it to “text displaying the number <result>”, but, also does the arithmetic wrong?

Yeah, I don’t think that combination of features is in any existing model or, really, in any of the datasets used for evaluation, or otherwise on anyone’s roadmap.




Pretend I wrote 2, edit timeout closed.

"Actually thinking about your prompt" is a necessary part of being able to make the prompts natural language instead of a long list of fantasy google image search terms.

Useful example being "my bedroom but in a new color", but some things I've typed into Midjourney that don't work include "a really long guinea pig" (you get a regular size one), "world's best coffee" (the coffee cup gets a world on it), etc. It's just too literal.

And yes, preprocessing with an LLM could do this.


I don't think they're saying that's a goal, I think they're curious if it is the case. LLMs are bad at arithmetic, this uses a LLM to process the prompt, that class of result seems plausible.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: