Hacker News new | past | comments | ask | show | jobs | submit login

How real is it though? This blog post says

In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.

which makes it sound like they used text + image prompts and then acted them out in the video, as opposed to Gemini interpreting the video directly.

https://developers.googleblog.com/2023/12/how-its-made-gemin...




After reading this blog post, that hands-on video is just straight-up lying to people. For the boxcar example, the narrator in the video says to Gemini:

> Narrator: "Based on their design, which of these would go faster?"

Without even specifying that those are cars! That was impressive to me, that it recognized the cars are going downhill _and_ could infer that in such a situation, aerodynamics matters. But the blog post says the real prompt was this:

> Real Prompt: "Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details."

They narrated inaccurate prompts for the Sun/Saturn/Earth example too:

> Narrator: "Is this the right order?"

> Real Prompt: "Is this the right order? Consider the distance from the sun and explain your reasoning."

If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!


Out of curiosity I've asked GPT-4V the same questions:

    Prompt: [attached image of the post-it notes with sun and planets] Is this right order?
    GPT-4: The image you've uploaded shows three drawings on individual pieces of paper. From left to right, there's a drawing of the sun, then what appears to be Saturn, and lastly a drawing that resembles Earth. If the intention is to display these celestial objects by their proximity to the Sun, then the order should be the Sun, Earth, and then Saturn, as Earth is closer to the Sun than Saturn. Therefore, the current order is not correct. To correct the order, you should place the Earth drawing second, between the Sun and Saturn.

    Prompt: [attached the image of the two car drawings] Based on their design, which one would go faster?
    GPT-4: The image shows two drawings of vehicles on slopes. The one on the left appears to have a more traditional box-like design, while the one on the right has a streamlined design with a pointed front. Based on principles of physics and aerodynamics, the vehicle with the streamlined design (on the right) would likely go faster down the slope. Streamlining reduces air resistance (drag), allowing the vehicle to move more efficiently. The pointed front helps to cut through the air and minimize the drag force, which would result in higher speeds compared to the boxier design on the left, which would encounter more air resistance.
I'm actually pretty impressed how well it did with such basic prompts.


What do you mean "Real Prompt"? Nowhere does it say these are the real prompts, it says

> In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.

Not "here are the full prompts used in the video" or something like that.

None of the entries match up 1:1. And the response to the car example in the video doesn't even make sense in response to the prompt in the post (no mention of speed), and certainly isn't a trimmed portion of the response in the post.

The video has the disclaimer "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity". It would be weird to write that but not mention that neither the prompts nor responses shared even the same set of words in the same order with the "Real" prompts and responses.

I think your assumption is wrong on this one.


Wow I was blown away when I watched this video.

Now that I learned how fake it is, that is more evidence that Google is in really bad shape with this.


>If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!

It's crazy that this is where we are now. This is obviously still crazy impressive even if hadn't done those edits.


It might still be crazy impressive, but none-the-less, going forward we now know that we cannot trust Google's videos about it, as they're heavily edited to look a lot more impressive than it is.


Those prompts aren't far off, but I still don't know how realistic the demo is. Until a product is in my hands, as far as I'm concerned it doesn't exist.


A lesson in how to commit securities fraud and get away with it.


Boo! Complete marketing garbage. May as well have been a Flash demo.


Yeah I think this comment basically sums up my cynicism about that video.

It's that, you know some of this happened and you don't know how much. So when it says "what the quack!" presumably the model was prompted "give me answers in a more fun conversational style" (since that's not the style in any of the other clips) and, like, was it able to do that with just a little hint or did it take a large amount of wrangling "hey can you say that again in a more conversational way, what if you said something funny at the beginning like 'what the quack'" and then it's totally unimpressive. I'm not saying that's what happened, I'm saying "because we know we're only seeing a very fragmentary transcript I have no way to distinguish between the really impressive version and the really unimpressive one."

It'll be interesting to use it more as it gets more generally available though.


You can see the cracks in the feature early on:

"What do you think I'm doing? Hint: it's a game."

Anyone with as much "knowledge" as Gemini aught to know it's roshambo.

"Is this the right order? Consider the distance from the sun and explain your reasoning."

Full prompt elided from the video.


I’ve heard of roshambo mostly from South Park

https://www.urbandictionary.com/define.php?term=roshambo

I’ve vaguely heard the term before, but I don’t know what regions of the world actually use that term.


Never heard it called that. Curious where you are from?


For what it's worth, I was born and raised in the Bay Area (in the 90s), and we called it ro-sham-bo growing up. Although it's incredibly strange to see that word in writing, I would always call it rock paper scissors if I were to write it.


It's always like this isn't it. I was watching the demo and thought why ask it what duck is in multiple languages? Siri can do that right now and it's not an ai model. I really do think we're getting their with the ai revolution but these demos are so far from exciting, they're just mundane dummy tasks that don't have the nuance of everything we really interact and would need help from an ai with


How do you know though? The responses in the video were not the same as those in the blog post.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: