Whether it is legally wrong or not to scan OSS code (I think it is wrong), there has been a time-honored precedent for disallowing automated scanning:
robots.txt
This is exactly what is needed for source code, and the default (no robots.txt) should be "disallow".
The fact that the Web has considered this moral issue should be a strong hint for the AI people not to take a purely legal stance but consider the OSS community that they are so heavily using.
The fact that the Web has considered this moral issue should be a strong hint for the AI people not to take a purely legal stance but consider the OSS community that they are so heavily using.