While it's not publicly available yet, I have strong suspicions that multimodal ...

mkaic on April 28, 2023 | parent | context | favorite | on: DeepFloyd IF: open-source text-to-image model

While it's not publicly available yet, I have strong suspicions that multimodal GPT-4 may actually be SOTA in image-to-text. The examples shown in the Sparks of AGI paper were extremely impressive imo, though of course those are cherry-picked so it's unclear how well the model will perform on non-cherry-picked images.