Whether it is legally wrong or not to scan OSS code (I think it *is* wrong), the...

Whether it is legally wrong or not to scan OSS code (I think it is wrong), there has been a time-honored precedent for disallowing automated scanning:

  robots.txt

This is exactly what is needed for source code, and the default (no robots.txt) should be "disallow".

The fact that the Web has considered this moral issue should be a strong hint for the AI people not to take a purely legal stance but consider the OSS community that they are so heavily using.