> I worry that they don't really accurately reflect real world closed source experience because of the inherent selection bias.
As opposed to what, yet another beginner React app? That’s what everyone seems to be testing with but none of the projects I’ve seen are reflective of a production codebase that’s years old and has been touched by a dozen developers.
Throw it at a complicated non-frontend mixed language repo like cxx-qt [1] or something, preferably where the training data doesn’t include the latest API.
"preferably where the training data doesn’t include the latest API"
That is the reason LLM's in their current shape are pretty useless to me for most tasks.
They happily mix different versions of popular frameworks, so I have to do so much manual work to fix it, I rather do all by myself then.
Pure (common) math problems, or other domains where the tech did not change so much, like bash scripts or regex are where I can use them. But my actual code? Not really. The LLM would need to be trained only on the API version I use and that is not a thing yet, as far as I am aware.
As opposed to what, yet another beginner React app? That’s what everyone seems to be testing with but none of the projects I’ve seen are reflective of a production codebase that’s years old and has been touched by a dozen developers.
Throw it at a complicated non-frontend mixed language repo like cxx-qt [1] or something, preferably where the training data doesn’t include the latest API.
[1] https://github.com/KDAB/cxx-qt