Hacker News new | past | comments | ask | show | jobs | submit login

"It just spits out the training set in random configurations (ish)." is a pretty gross misrepresentation and oversimplification of how such a model works, akin to saying a human artist only spits out whatever they saw earlier in their life in random configurations, or saying that SD only spits out pixel values it has seen before, or combinations of pixel values that form edges, etc.

FWIW I don't think there is anything particularly wrong in the model architectures or training data that in some fundamental way makes it impossible to always get 2 arms. After all, lots of other tricky things are almost always correct. I suspect it's a question of training time and model size mostly (not trivial of course as it's still expensive to re-train to check modified architectures etc). It's also a matter of diffusion sampling iterations and choice of sampler at inference time, for the case of SD.




I get your point, but I also think it depends on what you mean by oversimplification. Of course there is _a lot_ of stuff going on and things like SD capture all kinds of information, not just what I described, however, any way you want to describe it, capturing all the "constraints" and real life knowledge to perfectly create realistic images with all the details and all the higher abstractions correctly is not anywhere close I think. Also it's not only to always get 2 arms, it's to - at the same time - also get 2 ears, 2 eyes, perfect pupils, perfect fingers, perfect trees, perfect chairs, all simultaneously (if it is to be used at least in the mainstream) - etc you get my point.

I also don't think there's anything wrong with the model architectures in themselves or the data, nor that it is impossible, only that it is hard and as you say I think it needs a lot of data and clever engineering to fix mistakes. It may even be possible to fix most mistakes, over time, which would be pretty impressive imo, but the absolute limits of what a model can produce/"contain" with our hardware is kind of an open question though interesting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: