Hacker News new | past | comments | ask | show | jobs | submit login

Also seems very impractical to embed this into a deployed product. How can you possibly hope to control and estimate costs? I guess this is strictly meant for R&D purposes.



You can specify the max length of the response, which presumably includes the hidden tokens.

I don't see why this is qualitatively different from a cost perspective than using CoT prompting on existing models.


For one, you don't get to see any output at all if you run out of tokens during thinking.

If you set a limit, once it's hit you just get a failed request with no introspection on where and why CoT went off the rails


Why would I pay for zero output? That’s essentially throwing money down the drain.


You can’t verify that you’re paying what you should be if you can’t see the hidden tokens.


With the conventional models you don't get the activations or the logits even though those would be useful.

Ultimately if the output of the model is not worth what you end up paying for it then great, I don't see why it really matters to you whether OpenAI is lying about token counts or not.


As a single user, it doesn’t really, but as a SaaS operator I want tractable, hopefully predictable pricing.

I wouldn’t just implicitly trust a vendor when they say “yeah we’re just going to charge you for what we feel like when we feel like. You can trust us.”


They are currently trying to raise money (talk of new $150B valuation), so that may have something to do with it




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: