No thank you.
I put a license to be followed, not to just be disregarded by an AI as "Learning material".
No human perfectly reproduces their learning material no matter what, but Copilot does.
You mean to tell me that no one has ever perfectly replicated an example that they read somewhere? There's only so many ways to write AABB collision, fibonacci, or any number of other common algorithms. I'm not saying there aren't things to consider but I'm sure I've perfectly replicated something I read somewhere whether I'm actively aware of it or not
So are you ok with it being illegal for humans to learn from copyrighted books unless they have a license that explicitly allows learning? That does not sound like a pleasant consequence.
Would you use an AI text generator to write a thesis? No, there's a risk a whole chunk of it will be considered plagiarism because you have no idea what the source of the AI output is, but you know it was trained with unknown copyrighted material. This has nothing to do with the way humans learn, it's about correct attribution.
There is no technical reason why Microsoft can't respect licenses with Copilot. But that would mean more work and less training input, so they do code laundering and excuse it with comparisons to human learning because making AI seem more advanced than it is has always worked well in marketing.
Edit: And where do you draw the line between "learning" and copying? I can train a network to exactly reproduce licensed code (or books, or movies) just like a human can memorize it given enough time - and both of those would be considered a copyright violation if used without correct attribution. If you trained an AI model with copyrighted data you will get copyrighted results with random variation which might be enough to become unrecognizable if you're lucky.
> Would you use an AI text generator to write a thesis? No, there's a risk a whole chunk of it will be considered plagiarism because you have no idea what the source of the AI output is, but you know it was trained with unknown copyrighted material.
Of course, but that's a separate issue. We're not talking about whether the output of the AI is copyrighted. We're talking about whether it's ok for it to learn from copyrighted material.
Again you can say exactly the same about humans. I am perfectly capable of plagiarising or outputting copyrighted material. That doesn't mean it's illegal to learn from that material, just to output it verbatim.
So the fundamental issue is that it's harder to tell when an AI is plagiarising than it is when you produce something yourself. But that is a technical (and probably solvable) issue, not a legal one. And it's not the subject of this lawsuit.
Here's the thing - the US has well-established laws around copyright that don't consider learning from books a violation of those copyrights. This lawsuit is intended to challenge Copilot as a violation of licensing and isn't a litigation of "how people learn." Your program stole my code in violation of my license - there's a clear legal issue here.
I'd pose a question to you - would it be okay for me to copy/paste your code verbatim into my paid product in violation of your license and claim that I'm just using it for "learning"?
If you cherry picked sections of my code? I'd have no more issue with it then George R.R. Martin would care if you grabbed a few paragraphs out of one of his fantasy books and used them in your novel.
I think they're taking issue with the unauthorized duplication of copyrighted code. That's distinct from learning how to code (which I don't think anyone would claim Copilot is doing) which people get from reading a book. If you were to read the book only to copy it verbatim and resell it, you're going to have a bad time.
It's a pleasant consequence for the person who spent years becoming an expert and then writing the book. It's also a pleasant consequence for the people who buy the book, which might not have existed without a copyright system to protect the writer's interests.
AI are not humans, no human can read _all_ the code on Github. They certainly can't read _all_ the code on Github at the scale that MS can, and are unlikely to be able to extract profits directly from that code, in violation of the licensing.
100% false, there are loads of historical cases of people with eidetic memories being able to reproduce things that they've seen with near complete fidelity, there's no reason to believe that a coder with such a memory would be any different.