I am not against this lawsuit but I'm against the implications of this because i...

mkeeter · on Nov 3, 2022

Wine literally bans contributions from anyone that has seen Microsoft Windows source code:

https://wiki.winehq.org/Developer_FAQ#Who_can.27t_contribute...

c0balt · on Nov 3, 2022

Well they are a special case here however since they don't solve a specific problem nor build a programm per se but instead (re)build a programm after existing specs. Their explicit goal is to match the behaviour of another piece of software with a translation layer.

Forbidding people who have seen the "source" programm is most likely to protect their version from going from "matching behaviour" to "behaving like", as in the same code, point. This might also be intended to build a safeguard for good intentioned developers to not break their (most likely existing) own NDAs accidently.

sedatk · on Nov 3, 2022

> A programmer can read available but not oss licensed code and learn from it

Actually, we were forbidden to look at open source code at Microsoft (circa 2009) because it might influence our coding and violate licenses.

EMIRELADERO · on Nov 3, 2022

That was out of abundance of caution, not based on any legal precedent.

In fact, the little precedent that exists over learning from copyrightable code is in favor of it.

More important, the rule urged by Sony would require that a software engineer, faced with two engineering solutions that each require intermediate copying of protected and unprotected material, often follow the least efficient solution (In cases in which the solution that required the fewest number of intermediate copies was also the most efficient, an engineer would pursue it, presumably, without our urging.) This is precisely the kind of “wasted effort that the proscription against the copyright of ideas and facts . . . [is] designed to prevent.” (Sony v. Connectix)

elil17 · on Nov 3, 2022

That demonstrates that copyright laws are already stifling innovation.

Someone · on Nov 3, 2022

It demonstrates that it stifles copying. That may make it easier for the copier to innovate, but doesn’t dispute the main argument for having copyright protection: that, without the protection of copyright, the code wouldn’t have been written.

elil17 · on Nov 3, 2022

I think in the case of open source code, most of it still would have been written if no copyright protections existed.

Someone · on Nov 4, 2022

Most of it? I would think >50% of open source code writers find it necessary to restrict the rights to copy and use their code. In a world without copyright protections, would the GPL be legal?

(and I guess courts might, in the future, say the GPL expires when copyrights on the code expire)

saghm · on Nov 3, 2022

Sure, but given the timetable for changing the law, it still seems pretty reasonable to apply the same standard to Microsoft (and by extension Github) in the meantime

m00x · on Nov 3, 2022

Yeah, that's a good argument to fully disprove this as a loss to society, and instead as a gain.

josho · on Nov 3, 2022

I don’t quite agree. Msft took a conservative approach to copyright to protect their own business.

Meanwhile open source software has had an immeasurable benefit to society. My computer, tv, phone, light bulb, etc all benefit from OSS—running various licenses, and only a subset using a copyleft like license.

elil17 · on Nov 3, 2022

The fact that the laws are inconsistent and expensive to defend against leads companies like Microsoft to take this conservative approach that slows down progress.

HWR_14 · on Nov 3, 2022

That's the goal. To stifle using someone else's work.

Like, copyright laws are also stifling my innovative business creating BluRays of Disney films and selling them on Amazon.

schleck8 · on Nov 3, 2022

Copyright laws aren't preventing you from learning cinematography by watching said Disney movies though, and using all their techniques for your own project.

OpenAI did a dirty job though judging by the cases of the model just reproducing code to the comment, so I can understand why one would criticize this specific project.

elil17 · on Nov 3, 2022

That sucks for little snippets of software though, doesn’t it? It’s like copyrighting individual dance moves (not allowed under the current system) and forcing dancers to never watch each other to make sure they’re never stealing.

HWR_14 · on Nov 3, 2022

I mean, it's not like the copyrights are keeping you from doing things. It's stopping you from looking at someone else's source. And it's not like source is easy to accidentally see like dance moves are.

kens · on Nov 3, 2022

Way, way back in 1992, Unix Systems Laboratories sued BSDI for copyright infringement. Among other things, they claimed that since the BSD folks had seen the Unix source code, they were "mentally contaminated" and their code would be a copyright violation. This led to the BSD folks wearing "mentally contaminated" buttons for a while.

__alexs · on Nov 3, 2022

Do the TypeScript team code with their eyes closed?

sedatk · on Nov 3, 2022

Not sure, TypeScript didn't exist back then :)

eddsh1994 · on Nov 3, 2022

Have you seen some of that codebase? ;)

andrewmcwatters · on Nov 3, 2022

GitHub Copilot has been proven to use code without license attribution. This doesn't need to be as controversial as it is today.

If you're using code and know that it will be output in some form, just stick a license attribution in the autocomplete.

In fact, did you know this is what Apple Books does by default? Say, for example, you copy and paste a code sample from The C Programming Language. 2nd Edition. What comes out? The code you copy and pasted, plus attribution.

swhalen · on Nov 3, 2022

> A programmer can read available but not oss licensed code and learn from it. Thats fair use.

If a human programmer reads some else's copyrighted code, OSS or otherwise, memorizes it and later reproduces it verbatim or nearly so, that is copyright infringement. If it wasn't, copyright would be meaningless.

The argument, so far as I understand it, is that Copilot is essentially a compressed copy of some or all of the repositories it was trained on. The idea that Copilot is "learning from" and transforming its training corpus seems, to me, like a fiction that has been created to excuse the copyright infringement. I guess we will have to see how it plays out in court.

As a non-lawyer it seems to me that stable diffusion is also on pretty shaky ground.

APIs are not copyrightable (in the US), so Wine is safe (in the US).

Iv · on Nov 3, 2022

AI companies are running against the clock to normalize training against copyrighted data.

Let me tell you the story of Google Books, also known as "Authors Guild Inc. v. Google Inc"

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

In 2004, Google added copyrighted books to is Google Books search engine, that does search among millions of book text and shows full page results without any authors authorization. Any sane lawyer of the time would have bet on this being illegal because, well, it most certainly was. And you may be shocked to learn that it is actually not.

in 2005 the Authors Guild sues for this pretty straightforward copyright violation.

Now an important part of the story: IT TOOK 10 YEARS FOR THE JUDGEMENT TO BE DECIDED (8 years + 2 years appeal) during which, well, tech continued its little stroll. Ten year is a lot in the web world, it is even more for ML.

The judgement decided Google use of the books was fair use. Why? Not because of the law, silly. A common error we geeks do is to believe that the law is like code and that it is an invincible argument in court. No, the court was impressed by the array of people who were supporting Google, calling it an invaluable tool to find books, that actually caused many sales to increase, and therefore the harm the laws were trying to prevent was not happening while a lot of good came from it.

Now the second important part of the story: MOST OF THESE USEFUL USES HAPPENED AFTER THE LITIGATION STARTS. That's the kind of crazy world we are living in: the laws are badly designed and badly enforced, so the way to get around them is to disregard them for the greater good, and hope the tribunal won't be competent enough to be fast but not incompetent enough to fail and understand the greater picture.

Rants aside, I doubt training data use will be considered copyright infringement if the courts have a similar mindset than in 2005-2015. Copyright laws were designed to preserve the authors right to profit from copies of their work, not to give them absolute control on every possible use of every copy ever made.

cromka · on Nov 3, 2022

> A programmer can read available but not oss licensed code and learn from it. Thats fair use. If a machine does it

Quite sure the issue at hand is about the code being copied verbatim without the license terms, not "learning" from it.

bawolff · on Nov 3, 2022

> A programmer can read available but not oss licensed code and learn from it. Thats fair use. If a machine does it, is it wrong ?

You can learn from it, but if you start copying snippets or base your code on it to such an extent that its clear your work is based on it, things start to get risky.

For comparison, people have tried to get around copyright of photos by hiring an illustrator to "draw" the photo, which doesn't work legally. This situation seems similar.

michaelmrose · on Nov 3, 2022

Why wouldn't drawing the photo be fair use can you cite a case?

bawolff · on Nov 4, 2022

It might or might not be depending on the situation. Some of it might come down to intent.

Like if the drawing was meant to be an artistic rendering with independent artistic value, much more likely to be fair use. If the drawing was meant to be a loop-hole to avoid paying the licensing fee on the original, its much less likely. Fair use has a bunch of criteria - a lot of it depends on intention and how the usage would affect the original copyright holder.

I would add that fair use lets you use a copyrighted work, it doesn't make the copyright go away, just adds some cases where you can use the work notwithstanding the original copyright, but the original copyright is still there.

Note: IANAL, this all could be wrong. I dont have any cases, i do know that people propose this sort of thing at wikipedia from time to time - i.e. hiring someone to draw copyrighted photos - and it usually gets shot down as not solving the problem, although im not familiar with the legal basis.

TimTheTinker · on Nov 3, 2022

At least in legal terms, the difference between humans and machines couldn't be more clear.

amelius · on Nov 3, 2022

> If a machine does it, is it wrong ? What is the line between copying and machine learning ?

What is the difference between a neighbor watching you leave your home to visit the local grocery store and mass surveillance? Where do you draw the line?

It is pretty simple, actually.

kmeisthax · on Nov 3, 2022

Wine/Proton are safe because there is controlling 9th and SCOTUS precedent in favor of reimplementation of APIs.

The reason why those wouldn't apply to Copilot is because they aren't separating out APIs from implementation and just implementing what they need for the goal of compatibility or "programmer convenience". AI takes the whole work and shreds it in a blender in the hopes of creating something new. The hope of the AI community is that the fair use argument is more like Authors Guild v. Google rather than Sony v. Connectix.

bogwog · on Nov 3, 2022

> Today they're filing a lawsuit against copilot.

> Tomorrow it will be against stable diffusion or (dall-e, gpt-3 whatever)

> And then eventually against Wine/Proton and emulators (are APIs copyrightable)

Textbook definition of F.U.D.

laputan_machine · on Nov 3, 2022

Genuinely one of the worst takes I've ever read. I'm not against the 'slippery slope' argument in principle, but this example is ridiculous.

mardifoufs · on Nov 3, 2022

Slippery slope? Are you familiar with judicial precedent? Being bound to precedents is central to common law legal systems, so I don't think the GP's take was so outlandish. "Slippery slopes" and "whataboutism" might be thought-terminating buzzwords online, but not in front of a judge.

ImprobableTruth · on Nov 3, 2022

In what way would this even remotely set a precedent for APIs?

Barrin92 · on Nov 3, 2022

>A programmer can read available but not oss licensed code and learn from it. Thats fair use.

No it isn't, at least not automatically which is why infringement of licenses exists at all, the fact that you have a brain doesn't change that and never has. If you reproduce someone's code you can be in hot water, and that should be the case for an operator of a machine.

It's also why the concept of a clean room implementation exists at all.

EMIRELADERO · on Nov 3, 2022

I think the commenter you replied to was talking about using the functional, non-copyrightable elements of the copyrighted code. Clean-room is not even required by case law. There's precedent that explicitly calls it out as inefficient.

More important, the rule urged by Sony would require that a software engineer, faced with two engineering solutions that each require intermediate copying of protected and unprotected material, often follow the least efficient solution (In cases in which the solution that required the fewest number of intermediate copies was also the most efficient, an engineer would pursue it, presumably, without our urging.) This is precisely the kind of “wasted effort that the proscription against the copyright of ideas and facts . . . [is] designed to prevent.” (Sony v. Connectix)

zbentley · on Nov 4, 2022

> A programmer can read available but not oss licensed code and learn from it. Thats fair use. If a machine does it, is it wrong ?

My (extremely amateur) understanding is that what is meant by "learn from it" is one of the hinge points of the legal question.

If a programmer reads licensed code and reproduces it verbatim or near-verbatim in a project with a conflicting license, that becomes a legal problem in certain circumstances.

If a programmer reads the same code and gets an idea to implement something different, that's less troublesome (or at least, if it is troublesome it's in a different area; if the idea was related to a patentable process, then other questions arise, but I'm even less qualified to speak to that area of law).

There's nothing special about copy/paste buttons that make them the only way you can infringe copyright.

Fair use doesn't automatically kick in just because someone uses what they took/copied as part of a larger artifact; it's a really complicated legal line.

arpowers · on Nov 3, 2022

In some ways all these AIs are plagiarizing... I think creators should opt-in to ai models, as no current license was developed with this in mind.

grayfaced · on Nov 3, 2022

Maybe its time for Creative Commons License to address this. I'm curious if No-Derivative would already prohibit this? Does the ND language need tweaking? Or do they need a whole new clause.

Edit: I guess they do address it in their faq and I'd summarize it "Depends if copyright law applies and depends if it's considered derivative". https://creativecommons.org/faq/#artificial-intelligence-and...

az226 · on Nov 5, 2022

Not for GitHub -- users who upload their code accept GitHub's license agreements which allows it to use it in many different ways, including Copilot. Kind of how when you create a Robinhood account you agree to arbitration and can't sue them.

belorn · on Nov 3, 2022

It would be good to have a definitive and simple line for fair use that could be applied to all forms of copyright. Right now fair use is defined by four guidelines:

The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes

The nature of the copyrighted work

The amount and substantiality of the portion used in relation to the copyrighted work as a whole

The effect of the use upon the potential market for or value of the copyrighted work.

A programmer who studied in school and learned to code did so clearly for and educational purpose. The nature of the work is primarily facts and ideas, while expression and fixation is generally not what the school is focusing on (obviously some copying of style and implementation could occur). The amount and substantiality of the original works is likely to be so minor as to be unrecognized, and the effect of the use upon the potential market when student learn from existing works would be very hard to measure (if it could be detected).

When a machine do this, are we going to give the same answers? Their purpose is explicitly commercial. Machines operate on expression and fixation, and the operators can't extract the idea that a model should have learned in order to explain how a given output is generated. Machines makes no distinction of the amount and substantiality of the original works, with no ability to argue for how they intentionally limited their use of the original work. And finally, GitHub Copilot and other tools like them do not consider the potential market of the infringed work.

API's are generally covered by the interoperability exception. I am unsure how that is related copilot or dall-e (and the likes). In the Oracle v. Google case the court also found that the API in question was neither an expression or fixation of an idea. A co-pilot that only generated header code could in theory be more likely to fall within fair use, but then the scope of the project would be tiny compared to what exist now.

whateveracct · on Nov 3, 2022

> A programmer can read available but not oss licensed code and learn from it. Thats fair use. If a machine does it, is it wrong ?

Just because both activities are calling "learning" does not mean they are the same thing. They are fundamentally, physically different activities.

chiefalchemist · on Nov 3, 2022

Agreed. But it could go the other way as well. Let's say MS / HB wins and the decision establishes and even less healthy / profitable (?) outcome over the long term.

Remember when Napster was all the rage. And then Jobs and Apple stepped in and set an expectation for the value of a song (at 99 cents)? And that made music into the razor and the iPod the much more profitable blades. Sure it pushed back Napster but artists - as the creator of the goods - have yet to recover.

I'm not saying this is the same thing. It's not. Only noting that today's "win" is tomorrow's loss. This very well could be a case of be careful what you wish for.

bdcravens · on Nov 3, 2022

In most copyright cases, exposure to the material in question is always discussed.