What would an academic do with the parameter count? It’s just a marketing number

mk_stjames · on March 15, 2023

It's not just a marketing number, its a main indicator of model size and memory usage. Some of what is happening now is trying to see how 'large' the LLMs need to be to function at certain level, for instance it was claimed Llama (65B) had GPT-3 (175B) level performance but at 65B parameters that is a lot less memory usage. It's rough high level indicator of the computational requirements to run the model.

bertday · on March 15, 2023

Without accounting for data and model architecture, it’s not a very useful number. For all we know, they may have sparse approximations which would throw this off by a lot. For example, if you measure a fully connected model over images of size N^2 and compare it to a convolutional one, the former would have O(N^4) parameters and the latter would have O(K^2) parameters, for K<N window size. It’s only useful if you know they essentially stacked additional layers on top of GPT3.5, which we know is not the case as they added a vision head.

redox99 · on March 15, 2023

If this is like a multi trillion parameter model, then you know to replicate it it's probably cranking up the parameter count. If this is a <100M model, then you know there is some breakthrough they found that you need to find out, instead of wasting time and money with more parameters.

dx034 · on March 15, 2023

Maybe it wasn't parameter count increase that made any of this possible but they don't want to give that away. By keeping all developments vague, it's harder to determine if they found some novel technique they don't want others to know.