Curious that the metrics [1] of Gemini Ultra (not released yet?) vs GPT4 are for some tasks computed based on "CoT @ 32", for some "5-shot", for some "10-shot", for some "4-shot", for some "0-shot" -- that screams cherry-picking to me.
Not to mention that the methodology is different for Gemini Ultra and Gemini Pro for whatever reason (e.g. MMLU Ultra uses CoT @ 32 and Pro uses CoT @ 8).
Not to mention that the methodology is different for Gemini Ultra and Gemini Pro for whatever reason (e.g. MMLU Ultra uses CoT @ 32 and Pro uses CoT @ 8).
[1] Table 2 here: https://storage.googleapis.com/deepmind-media/gemini/gemini_...