> but since it doesn't have a fundamental understanding of the patterns/objects ...

ffwd · on Nov 4, 2022

Actually I should have mentioned this in the original post but I think the "3 arms" thing is kind of a bad example come to think of it. I think in general at least with SD, if's very unlikely to create 3 arms or or 8 arms if you for example ask for a person. Mostly it looks like a person because the text prompt maps to training data of persons, and so they will generally look like people with 2 arms.

However, where it struggles I find is with finer details, and also _placement_ of things like arms, eyes, and relationships between them. This I think is because it only has a general idea of the shape of persons but no data for the exact specifics like where the arms, legs, eyes and so on should be placed in a very realistic anatomical way, and this is where I think the challenge is - the gap between a general pattern of a person and an extremely specific but also general one where it can modify it and transform it like a real human artist can. I'm not sure that's in the data exactly