This assumes the team deploying the RAG-based solution has equal ability to either engineer a RAG-based system or to finetune an LLM. Those are different skillsets and even selecting which LLM should be finetuned is a complex question, let alone aligning it, deploying it, optimizing inference etc.
The budget question comes into play as well. Even if text is repetitively fed to the LLM, that might happen over a long enough time compared to finetuning which is a sort of capex that it is financially more accessible.
Now bear in mind, I'm a big proponent of finetuning where applicable and I try to raise awareness to the possibilities it opens. But one cannot deny RAG is a lot more accessible to teams which are likely developers / AI engineers compared to ML engineers/researchers.
You are certainly right, managed platforms make finetuning much easier. But managed/closed model finetuning is pretty limited and in fact should be named “distribution modeling” or something.
Results with this method are significantly more limited compared to all the power open-weight finetuning gives you (and the skillset needed in return).
And in either case don’t forget alignment and evals.
> Results with this method are significantly more limited compared to all the power open-weight finetuning gives you (and the skillset needed in return).
I am not sure I understand why you are so certain that finetuned top market models, built by top researchers will be significantly worse than whatever open source model you pick.
Its easy to bet, more and more people switching to chatbots with tasks they previously used search for, and this can dramatically affect Google main revenue stream: Search Ads.
I don't think is going to happen often, but definitely happens. We see that more than anyone would do because we run a lot of different workload types, any possible combination of types/sql/engines...
Running postgres in my previous company we saw a few of them as well and I'd consider postgres a "production grade" system.
it depends on your data access pattern. If some text goes through LLM input many times, it is more efficient for LLM to be finetuned on it once.
reply