With data going forward? or past data? Because I don't agree with using my past data... (kinda of rhetorical as I long deleted my accounts, though I'm sure the data is still there)
Technically, not deleting your data would be a GDPR violation of the right to be forgotten. They could be ignoring that but it would be very expensive if they were found out. Probably doesn't apply to models that have already been trained, but should apply effectively going forward.
Meh, the ToS is unenforceable and VPNs with residential IPs are not that expensive. Sure it's little harder to scrape than HN for example, but everyone trains on twitter data (along with everything else on the internet)
Oh, it's very enforceable. People have literally been imprisoned for years for using alternate tools, like wget, to access a website. The not being enforced most of the time so that everyone breaks / has broken the CFAA laws covering TOS is a feature of the system. Like other such laws it only gets enforced when you rock the boat and someone rich/powerful enough buys a district attorney to indict you.
I asked because I did google it and I found nothing.
Aaron Swartz was not "imprisoned for years".
Weev "exposed a flaw in AT&T security in June 2010, which allowed the e-mail addresses of iPad users to be revealed.[39] The flaw was part of a publicly-accessible URL, which allowed the group to collect the e-mails without having to break into AT&T's system.[40] Contrary to what it first claimed,[41] the group revealed the security flaw to Gawker Media before AT&T had been notified,[40] and also exposed the data of 114,000 iPad users, including those of celebrities, the government and the military"
That's a pretty silly security breach, but it's still a real security breach. Not comparable to scraping twitter
Having to go to court in itself and spend literally months or years with the possibility of being sentenced to most of the rest of your life in prison is itself a very severe punishment.
I think they're talking about the intention of the user in posting the data. People that post on Twitter and HN understand that the visibility is public, unlike Gmail.