Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Heard all the news how Gemini 3 is passing everyone on benchmarks, so quickly tested and still find it a far cry from ChatGPT in real world use when testing questions on both platforms. But importantly the ChatGPT app experience at least for iPhone/Mac users is drastically superior vs Google which feels very Google still. So Gemini would have to be drastically better answer wise than ChatGPT to lure users from a better UI/UX experience to Gemini. But glad to see competition since certainly don't want only one winner in this race.




That's really fascinating. Every real world use case I've tried on Gemini (especially math-related) absolutely slaughtered the performance of ChatGPT in speed and quality, not even close. As an Android user, the Gemini app is also far superior, since the ChatGPT app still doesn't properly display math equations, among plenty of other bugs.

I have to agree with you but I'll remain a skeptic until the preview tag is dropped. I found Gemini 2.5 Pro to be AMAZING during preview and then it's performance and quality unceremoniously dropped month after month once it went live. Optimizations in favor of speed/costs no doubt but it soured me on jumping ship during preview.

Anthropic pulled something similar with 3.6 initially, with a preview that had massive token output and then a real release with barely half -- which significantly curtails certain use cases.

That said, to-date, Gemini has outperformed GPT-5 and GPT5.1 on any task I've thrown at them together. Too bad Gemini CLI is still barely useful and prone to the same infinite loop issues that have plagued it for over a year.

I think Google has genuinely released a preview of a model that leapfrogs all other models. I want to see if that is what actually makes it to production before I change anything major in my workflows.


It's generally anecdotal and vibes when people make claims that some AI is better than another for things they do. There are too many variables and not enough eval for any of it to hold water imo. Personal preferences, experience, brand loyalty, and bias at play too

it's contemporary vim vs emacs at this point


I get what you're saying because this is typically true (this is a strong motivator for my current research) but I don't think it applies here and OpenAI seems to agree with me. Some cases are clear: GPT-5 is clearly better than Llama 3 for example. If there is a sizeable enough difference across virtually all evals, it is typically clear that one LLM is a stronger performer than another.

Experiences aside, Gemini 3 beats GPT-5 on enough evals that it seems fair to say that it is a better model. This appears in line with public consensus, with a few exceptions. Those exceptions seem to be centered around search.


Try doing some more casual requests.

When I asked both ChatGPT 5.1 Extended Thinking and Gemini 3 Pro Preview High for best daily casual socks both responses were okay and had a lot of the same options, but while the ChatGPT response included pictures, specs scraped from the product pages and working links, the Gemini response had no links. After asking for links, Gemini gave me ONLY dead links.

That is a recurring experience, Gemini seems to be supremely lazy to its own detriment quite often.

A minute ago I asked for best CR2032 deal for Aqara sensors in Norway, and Gemini recommended the long discontinued IKEA option, because it didn't bother to check for updated information. ChatGPT on the other hand actually checked prices and stock status for all the options it gave me.


I would further argue the apps are all like 99% the same. And also work just fine through a browser without installing anything

What do you mean? It renders LaTex fine on Android.

Some LaTeX, but not all, especially for larger equations. I will admit it has gotten a lot better in recent updates, since it seemed thoroughly broken for quite a while in its early days.

I had a problem where ChatGPT rendered math to me from right to left. Sure thing YMMV

You're using paid ChatGPT, set to 5.1 with Thinking?

Not op but yes and yes.

I pay for Claude, Gemini and ChatGPT.

Gemini 3 replaced ChatGPT for me and if things don't change I'll cancel ChatGPT for lack of usefulness.


One might think that benchmarks do not say much about individual usage and that an objective assessment of the performance of AIs is difficult.

At least, thanks to the hype, RAM and SSDs are becoming more expensive, which eats up all the savings from using AI and the profits from increased productivity /s?


> But importantly the ChatGPT app experience at least for iPhone/Mac users is drastically superior vs Google which feels very Google still. So Gemini would have to be drastically better answer wise than ChatGPT to lure users from a better UI/UX experience to Gemini.

Yes, the ChatGPT experience is much better. No, Gemini doesn't need to make a better product to take market share.

I've never had the ChatGPT app. But my Android phone has the Gemini app. For free, I can do a lot with it. Granted, on my PC I do a lot more with all the models via paid API access - but on the phone the Gemini app is fine enough. I have nothing to gain by installing the ChatGPT app, even if it is objectively superior. Who wants to create another account?

And that'll be the case for most Android users. As a general hint: If someone uses ChatGPT but has no idea about gpt-4o vs gpt-5 vs gpt-5.1 etc, they'll do just fine with the Gemini app.

Now the Gemini app actually sucks in so many ways (it doesn't seem to save my chats). Google will fix all these issues, but can overtake ChatGPT even if they remain an inferior product.

It's Slack vs Teams all over again. Teams one by a large margin. And Teams still sucks!


Well I have been using Gemini and ChatGPT side by side for over 6 months now.

My experience is Gemini has significantly improved its UX and performs better that requires niche knowledge, think of some ancient gadgets that have been out of production for 4-5 decades. Gemini can produce reliable manuals, but ChatGPT hallucinates.

UX wise ChatGPT is still superior and for common queries it is still my go to. But for hard queries, I am team Gemini and it hasn’t failed me once


Benchmaxxing galore by lots of teams in this space.

I think it’s entirely possible that AI actually has plateaued, or has reached a point where a jump in intelligence comes at the cost of reliability.

I suspect it's reached the point where the distinguishing quality of one model over the others is only observable by true experts -- and only in their respective fields. We are exhausting the well of frontier questions that can be programmatically asked and the answers checked.

Absolutely this. Strong disagree that progress is plateauing, merely that gains are harder for the general public to perceive and typically come from more advanced means than simply scaling. Math performance in particular is improving at an uncomfortably rapid pace.

AI in general? Not at all. LLM's maybe a little bit, when even Sam Altman said, the progress is logarithmic to the investment. Still, there is progress. And the potential of LLM based agents, where many different models and other technics are mixed in together, we just started to explore.

I've been a paying high volume user of ChatGPT for a while. I found the transition to Gemini to be seamless. I've been pleasantly surprised. I bounce between the two. I'm at about 60% Gemini, 40% ChatGPT.

> But importantly the ChatGPT app experience at least for iPhone/Mac users is drastically superior vs Google which feels very Google still. So Gemini would have to be drastically better answer wise than ChatGPT to lure users from a better UI/UX experience to Gemini.

Opposite is true for a larger market. Gemini is great and available with one button click on most consumer phones. OpenAI will never crack most Android users by this logic of yours


I had a similar experience, signing up for the first time to give Gemini a test drive on my side project after a long time using ChatGPT. The latter has a native macOS client which "just works" integrating with Xcode buffers. I couldn't figure out how to integrate Gemini with Xcode quickly enough so I'm resorting to pasting back & forth from the browser. A few of the exchanges I've had "felt smarter" — but, on the whole, it feels like maybe it wasn't as well trained on Swift/SwiftUI as the OpenAI model. I haven't decided one way or another yet, but those are my initial impressions.

Gemini comes with the 1.99 Google One plan. So I use that

Actually, it comes with the free plan. The $1.99 plan doesn't give you any more AI capabilities. Only at the $19.99/mo plan do you get more.

https://one.google.com/about/#compare-plans


Well then the usage is already so useful in Free mode that I didn’t even notice it. “Thinking ” has a meaningful cap. But I have not felt the need to pay for more. I pay for Claude.

That must've changed very recently then, even just a month ago I'd have Gemini (2.5 Pro) run into a daily limit after just 3-4 messages as a free user.

I bought on Black Friday, so pretty sure I got extended Gemini usage for 1.99

> So Gemini would have to be drastically better answer wise than ChatGPT to lure users from a better UI/UX experience to Gemini.

or cheaper/free


Its really hard to measure these things. Personally I switched to Gemini a few months ago since it was half the cost of ChatGPT (Verizon has a $10/month Google AI package). I feel like I've subconsciously learned to prompt it slightly differently and now using OpenAI products feels disappointing. Gemini tends to give me the answer I expect, Claude follows close behind, I get "meh" results from OpenAI.

I am using Gemini 3 Pro, I rarely use Flash.


What are your primary usecases? Are you mostly using it as a chatbot?

I find gemini excels in multimodal areas over chatgpt and anthropic. For example, "identify and classify this image with meta data" or "ocr this document and output a similar structure in markdown"


Curiously, I had the opposite experience, except for Deep Research mode where after the latest update the OpenAI offering has become genuinely amazing. This is doubly ironic because Gemini has direct API access to Google search!

It is good, but Pro subscribers get only five per month. After that, it’s a limited version, and it’s not good (normal 5.1 gives more comprehensive answers than DR Limited).

Google search is awful. I don't think they can put lipstick on that particular pig and expect anyone to think it's beautiful.

I'm sure they give their AI models a superior search than they give to us.

Also if you prompt Google search the right way it's unfortunately still superior to most if not all other solutions in most cases.


they're deep into a redesign of the gemini app, idk when it will be released or if its going to be good, but at least they agree with you and are putting significant resources into fixing it.

I did notice a bug on the iPhone, even with app background refresh, if the phone shuts off the screen, a prompt that was processing stalls out.

I couldn't even get ChatGPT to let me download code it claimed to program for me. It kept saying the files were ready but refused to let me access or download anything. It was the most basic use case and it totally bombed. I gave up on ChatGPT right then and there.

It's amazing how different people have wildly varying experiences with the same product.


It's because comparing their "ChatGPT" experience with your "ChatGPT" experience doesn't tell anyone anything. Unless people start saying what models they're using and prompts, the discussions back and forth about what platform is the best provides zero information to anyone.

It’s the equivalent of the user that points at their workstation tower and exclaims that the “hard drive is broken!”

Use the right words, get the right response.

Ah… ahhh… I get now why they get such bad results from AI models.


Did you wait a while before downloading? The links it provides for temporary projects have a surprisingly brief window where you can download them. I've had similar experience when even waiting 1 minute to download the file.

Since LLMs are non deterministic it's not that amazing. You could ask it the same question as me and we could both get very different conversations and experiences

The same thing happens to me in Claude occasionally. I have to tell it "Generate a tar.gz archive for me to download".

Training and gaming for the benchmarks is different than actual use.

This is exactly my experience. And it's funny -- this crowd is so skeptical of OpenAI... so they prefer _Google_ to not be evil? It's funny how heroes and villains are being re-cast.

Yeah, hate to say but for me a big thing is i still couldn't separate my Gemini chats into folders. I had ChatGPT export some profiles and history and moved it into Gemini, and 1) when Gemini gave me answers i was more pleased but 2) Gemini was a bit more rigorous on guard rails, which seems a bit overly cautious. I was asking some pretty basic non-controversial stuff.


If I research anything close to controversial, I use Grok. Its no-censorship attitude is great.

I'm confused as well, it hallucinated like crazy

like it seems great, but then it's just bullshitting about what it can do or whatever




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: