In 2004, Google added copyrighted books to is Google Books search engine, that does search among millions of book text and shows full page results without any authors authorization. Any sane lawyer of the time would have bet on this being illegal because, well, it most certainly was. And you may be shocked to learn that it is actually not.
in 2005 the Authors Guild sues for this pretty straightforward copyright violation.
Now an important part of the story: IT TOOK 10 YEARS FOR THE JUDGEMENT TO BE DECIDED (8 years + 2 years appeal) during which, well, tech continued its little stroll. Ten year is a lot in the web world, it is even more for ML.
The judgement decided Google use of the books was fair use. Why? Not because of the law, silly. A common error we geeks do is to believe that the law is like code and that it is an invincible argument in court. No, the court was impressed by the array of people who were supporting Google, calling it an invaluable tool to find books, that actually caused many sales to increase, and therefore the harm the laws were trying to prevent was not happening while a lot of good came from it.
Now the second important part of the story: MOST OF THESE USEFUL USES HAPPENED AFTER THE LITIGATION STARTS. That's the kind of crazy world we are living in: the laws are badly designed and badly enforced, so the way to get around them is to disregard them for the greater good, and hope the tribunal won't be competent enough to be fast but not incompetent enough to fail and understand the greater picture.
Rants aside, I doubt training data use will be considered copyright infringement if the courts have a similar mindset than in 2005-2015. Copyright laws were designed to preserve the authors right to profit from copies of their work, not to give them absolute control on every possible use of every copy ever made.
Let me tell you the story of Google Books, also known as "Authors Guild Inc. v. Google Inc"
https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....
In 2004, Google added copyrighted books to is Google Books search engine, that does search among millions of book text and shows full page results without any authors authorization. Any sane lawyer of the time would have bet on this being illegal because, well, it most certainly was. And you may be shocked to learn that it is actually not.
in 2005 the Authors Guild sues for this pretty straightforward copyright violation.
Now an important part of the story: IT TOOK 10 YEARS FOR THE JUDGEMENT TO BE DECIDED (8 years + 2 years appeal) during which, well, tech continued its little stroll. Ten year is a lot in the web world, it is even more for ML.
The judgement decided Google use of the books was fair use. Why? Not because of the law, silly. A common error we geeks do is to believe that the law is like code and that it is an invincible argument in court. No, the court was impressed by the array of people who were supporting Google, calling it an invaluable tool to find books, that actually caused many sales to increase, and therefore the harm the laws were trying to prevent was not happening while a lot of good came from it.
Now the second important part of the story: MOST OF THESE USEFUL USES HAPPENED AFTER THE LITIGATION STARTS. That's the kind of crazy world we are living in: the laws are badly designed and badly enforced, so the way to get around them is to disregard them for the greater good, and hope the tribunal won't be competent enough to be fast but not incompetent enough to fail and understand the greater picture.
Rants aside, I doubt training data use will be considered copyright infringement if the courts have a similar mindset than in 2005-2015. Copyright laws were designed to preserve the authors right to profit from copies of their work, not to give them absolute control on every possible use of every copy ever made.