Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

  How much does it cost to train a cutting edge LLM? Those costs need to be factored into the margin from inferencing.
They don't, though! I can buy hardware off of the shelf, host open source models on it, and then charge for inference:

https://parasail.io, https://www.baseten.co





Yes, which is why the companies that develop the models aren't cost viable. (Google and others who can subsidize it at a loss obviously are excepted)

Where is the return on the model development costs if anybody can host a roughly equivalent model for the same price and completely bypass the model development cost?

Your point is inline with the entire bear thesis on these companies.

For any use cases which are analytical/backend oriented, and don't scale 1:1 with number of users (of which there are a lot), you can already run a close to cutting edge model on a few thousand dollars of hardware. I do this at home already


Open source models are still a year or so behind the SotA models released the last few months. The price to performance is definitely in favor of Open Source models however.

DeepMind is actively using Google’s LLMs on groundbreaking research. Anthropic is focused on security for businesses.

For consumers it’s still a better deal for a subscription than to invest a few grand in a personal LLM machine. There will be a time in the future where diminishing returns shortens this gap significantly, but I’m sure top LLM researchers are planning for this and will do whatever they can to keep their firm alive beyond the cost of scaling.


Definitely

I am not suggesting these companies can't pivot or monetize elsewhere, but the return on developing a marginally better model in-house does not really justify the cost at this stage.

But to your point, developing research, drugs, security audits or any kind of services are all monetization of the application of the model, not the monetization of the development of new models.

Put more simply, say you develop the best LLM in the world, that's 15% better than peers on release at the cost of $5B. What is that same model/asset worth 1 year later when it performs at 85% of the latest LLM?

Already any 2023 and perhaps even 2024 vintage model is dead in the water and close to 0 value.

What is a best in class model built in 2025 going to be worth in 2026?

The asset is effectively 100% depreciated within a single year.

(Though I'm open to the idea that the results from past training runs can be reused for future models. This would certainly change the math)


For sure, all these companies are racing to have the strongest model, and as time goes on we quickly start reaching diminishing returns. DeepSeek came out at the beginning of this year, blew everyone's minds, and now look at how far the industry has progressed beyond it.

It doesn't even seem like these companies are in a battle of attrition to not be the first to go bankrupt. Watching this would be a lot more exciting if that was the case! I think if there was less competition between LLMs developers could slow down, maybe.

Looking at the prices of inference of open-source models, I would bet proprietary models are making a nice margin on API fees, but there is no way OpenAI will make their investors whole because they make a few dollars of revenue for a million tokens. I am terrified of the world we will live in if OpenAI will be able to reverse their balance sheet. I think there's no where else that investors want to put their money.


The other nightmare for these companies, is that any competitor can use their state of the art model for training another model. As some Chinese models are suspected to do. I personally think it's only fair, since those companies in the first place trained on a ton of data and nobody agreed to it. But it shows that training the frontier models have really low returns on investment



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: