Hacker News new | past | comments | ask | show | jobs | submit login

Bandwidth isn't free, not at the volume these crawlers scrape at; serving them random data (for example by leading them down an endless tarpit of links that no human would end up visiting) would still incur bandwidth fees.

Also it's not identifiable AI bot traffic that's detected (they mask themselves as regular browsers and hop between domestic IP addresses when blocked), it's just really obviously AI scraper traffic in aggregate: other mass crawlers have no benefit from bringing down their host sites, except for AI.

A search engine has nothing if it brings down the site they're scraping (and has everything to gain from identifying itself as a search engine to try and get favorable request speeds - the only thing they'd need to check is if the site in question isn't serving different data, but that's much cheaper), same with an archive scraper and those two are pretty much the main examples I can think of for most scraping traffic.




Hmm, maybe you could zipbomb the data? Aka, you send a few kilobytes of compressed data that expands to many gigabytes on client side?



For Cloudflare, bandwidth is practically free.


arnt a lot of these bots now actively loading javascript? you could just load a simple script that does the job .


If they agree to mine crypto for you then you send valid data. Is this a win-win?

(I feel I need to preemptively state that I am being sarcastic.)


>Bandwidth isn't free

Via peering agreements it is.


Not something available to smaller sites


Yes, it is. They transitively get it via the agreements the smaller site's host's host makes. Or via services like Cloudflare.


What button do I click in the AWS panel for that?


There is no button. AWS is where you go to light money on fire.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: