> although it's possible Claude 4 was trained on that discussion lol
Almost guaranteed, especially since HN tends to be popular in tech circles, and also trivial to scrape the entire thing in a couple of hours via the Algolia API.
Recommendation for the future: keep your benchmarks/evaluations private, as otherwise they're basically useless as more models get published that are trained on your data. This is what I do, and usually I don't see the "huge improvements" as other public benchmarks seems to indicate when new models appear.
Almost guaranteed, especially since HN tends to be popular in tech circles, and also trivial to scrape the entire thing in a couple of hours via the Algolia API.
Recommendation for the future: keep your benchmarks/evaluations private, as otherwise they're basically useless as more models get published that are trained on your data. This is what I do, and usually I don't see the "huge improvements" as other public benchmarks seems to indicate when new models appear.