Yet we all use web browsers that copy copyrighted text from buffer to buffer all the time. This doesn't even include all of the copying that ISPs perform.
It might be fair to say that the read performed in training has the same character since no human is involved.
The real copyright violation would be using a derived work.
A browser isn't a amalgamation of billions of pieces of other works. A browser executes and renders code it's served.
Copilot's corpus is quite literally tomes of copyrighted work that are encoded and compressed in its neural network, from which it launders that work to create similar works. Copilot itself, the neutral network, is that corpus of encoded and compressed information, you can't separate the two. Copilot stores and distributes that work without any input from rightsholders, and it does it for profit.
A better analogy would be between a browser and a file server filled with copyrighted movies whose operator charges $10/mo for access. The browser is just a browser in this analogy, where the file server is the corpus that forms Copilot itself.
the actual copying isn't a problem, it's distribution. if i buy access to a PDF i'm not going to get in trouble for duplicating the file unless i send it to someone else.
when someone uploads their copyrighted text to a web page they are distributing it to whoever visits that page. the browser is just the medium.
It might be fair to say that the read performed in training has the same character since no human is involved.
The real copyright violation would be using a derived work.