Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Someone made a tool a few years ago that basically unmasked all HN secondary accounts with a high degree of certainty. It scared the shit out of me how easy it picked out my alts based on writing style.




I think that original post was taken down after a short while but antirez was similarly nerd sniped by it and posted this which i keep a link to for posterity: https://antirez.com/news/150

"Well, the first problem I had, in order to do something like that, was to find an archive with Hacker News comments. Luckily there was one with apparently everything posted on HN from the start to 2023, for a huge 10GB of total data. You can find it here: https://huggingface.co/datasets/OpenPipe/hacker-news and, honestly, I’m not really sure how this was obtained, if using scarping or if HN makes this data public in some way."

This is funny to me in a number ways. I doubt anyone would be interested in post-2023 data dumps for fear it would be too contaminated with content produced from LLMs. It's also funny that the archive was hosted by huggingface which just removes any sliver of doubt they scarped (sic) the site.


"Show HN: Using stylometry to find HN users with alternate account" (2022), 500 comments, https://news.ycombinator.com/item?id=33755016

> with a high degree of certainty

No it didn't. As the top comment in that thread points out, there were a large number of false positives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: