Hacker News new | past | comments | ask | show | jobs | submit login

quick to assert authoritative opinion - yet the one word "better" belies the message ? Certainly there is are more dimensions worth including in the rating?



Certainly, there may be aspects of a particular 7b model which could beat another particular 70b model and greater detail into different pros and cons of different models are worth considering but people are trying to rank models and if we're ranking (calling one "better" than another), we might as well do it as accurately as we can since it can be so subjective.

I see too many misleading "NEW 7B MODEL BEATS GPT-4" posts. People test those models a couple of times, come back to the comments section, declare it true, and onlookers know no better than to believe it and in my opinion has led to many people claiming 7b models have gotten as good as Llama 2 70b or GPT-4 when it is not the case when you account for the overfit being exhibited by these models and actually put them to the test via human evaluation.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: