In this case, it's more like "find a bunch of images that have been associated with the word 'watermelon', then generate some pixels that kind of look like those images". That's not a very high level of reasoning.
[Edit: Though I guess it's somewhat decent to realize that, given the request for a button that "looks like X", it needs to go find images of X.]
[Edit: Though I guess it's somewhat decent to realize that, given the request for a button that "looks like X", it needs to go find images of X.]