OpenRouter lets you fund a wallet and spend no more than that. Google will let it go out of control and they purposely delay the billing console by up to 24 hours so if you don't track it all yourself you can get hit big, especially if it is a coding error that uses up to the rate limits.
There are better solutions in the market if you're looking for in-depth observability for LLM inference.
For example, use Requesty (requesty at ai) to get very in-depth analytics, breakdowns and logs. You can also set spend limits, create routing policies or allow only a sub-set of models that do not retain data.
Well OpenRouter is also facading the API calls, so you may not get the full details of the response back from the upstream LLM service. As far as I can tell the Gemini API returns the token counts in its response enabling you to estimate billing yourself if you want to.
> they purposely delay the billing console by up to 24 hours
This is about scalability and performance. Billing for as many requests per second as a cloud provider gets can't be done live, without significant performance and reliability degradation.
With open router, no matter what happens you won't spend more than you deposited or owe more. It's much safer.
> This is about scalability and performance. Billing for as many requests per second as a cloud provider gets can't be done live, without significant performance and reliability degradation.
I don't buy this, for LLMs specifically. For lots of things a cloud provider gives, things might be aggregated and batched before showing up, but there is no reason your LLM spend should take nearly as long as bank system clearing to show. Especially an estimate, which it already gets disclaimed as.
There are companies that will monitor your cloud spend much faster pretty cheaply, and they are essentially having to reimplement the whole thing from the outside and keep up with Google's pricing changes through a shadow recreation of the billing system.
And open router is able to reflect your spend to you immediately, or couldn't implement their cap. If they can do it why can't Google, at least for the broad number of customers without custom price agreements.