They probably asynchronously verify that the IP address actually belongs to googlebot, then ban the IP when it fails.
Synchronously verifying it, would probably be too slow.
You can verify googlebot authenticity by doing a reverse dns lookup, then checking that reverse dns name resolves correctly to the expected IP address[0].
Which leads to the possibility of triggering a self-inflicted DoS. I am behind a CGNAT right now. You reckon that if I set myself to Googlebot and loaded NYT, they'd ban the entire o2 mobile network in Germany? (or possibly shared infrastructure with Deutsche Telekom - not sure)
Not to mention the possibility of just filling up the banned IP table.
Hypothetically if they were doing that, they’d only be ‘banning’ that mobile network in the ‘paywall-relaxing-for-Googlebot’ code - not banning the IP traffic or serving a 403 or anything. They ordinarily throw paywalls at those users anyway.
There are easily installable databases of IP block info, super easy to do it synchronously, especially if it’s stored in memory. I run a small group of servers that each have to do it thousands of times per second.
Synchronously verifying it, would probably be too slow.
You can verify googlebot authenticity by doing a reverse dns lookup, then checking that reverse dns name resolves correctly to the expected IP address[0].
[0]: https://developers.google.com/search/docs/crawling-indexing/...