How would you ever prove the parameters of a model were generated by specific training data? Couldn't multiple sets of training data produce the same embeddings/parameters? I imagine there could be infinite possible sets of training data that would lead to the same results, depending on the type of predictive software.
This does seem like a pretty compelling rebuttal, since the preceding comment suggests that GPL does nothing to Microsoft's ability to incorporate code into Copilot's model.
https://en.m.wikipedia.org/wiki/BSD_licenses