> But the problem is that these models don't exist in a vacuum, and have to go against slightly larger ones that are also compute optimal and use more data, which will definitely perform better.
They don't have to go against those though. Most of these models are research models, either from academia or from companies experimenting to see what works. From my understanding, most of these are a - "We have X amount of USD for the next month or so, we'll try a few things, then whatever our best bet is we'll stick the time out on that".
Very few companies have the resources to train big models with as much compute as Google/OpenAI/Microsoft/Facebook.
These are also not being monetized as they're open source.
Going from their 2.7B model to 10B would be ~10X the compute (FLOPS) required for an optimal model. And this is likely their first open model and not their last, since Replit likely doesn't have the budget that openai does it makes sense they didn't want to blow their entire year's budget on their first open model.
2.7B would also be a really nice if anyone can get it working because it's more likely to be able to run in the IDE at that point instead of needed a massively scaled cloud (which might be valuable for replit).
They don't have to go against those though. Most of these models are research models, either from academia or from companies experimenting to see what works. From my understanding, most of these are a - "We have X amount of USD for the next month or so, we'll try a few things, then whatever our best bet is we'll stick the time out on that".
Very few companies have the resources to train big models with as much compute as Google/OpenAI/Microsoft/Facebook.
These are also not being monetized as they're open source.
Going from their 2.7B model to 10B would be ~10X the compute (FLOPS) required for an optimal model. And this is likely their first open model and not their last, since Replit likely doesn't have the budget that openai does it makes sense they didn't want to blow their entire year's budget on their first open model.
2.7B would also be a really nice if anyone can get it working because it's more likely to be able to run in the IDE at that point instead of needed a massively scaled cloud (which might be valuable for replit).