Hacker News new | past | comments | ask | show | jobs | submit login

I needed a reader view library for a side project and decided to compare the most popular options (repo at https://github.com/awendland/readable-web-extractor-comparis...). Among cleanview, metascraper, @postlight/mercury-parser, and mozilla/readability I thought that mozilla/readability performed the best because of its consistent extraction of the primary content and minimal mangling of the semantic structure.

For a quick preview of each library on a random sample of 16 articles posted to HN, see https://github.com/awendland/readable-web-extractor-comparis... (you’ll need to expand a row to see its results).




Interesting. Frankly, I didn't put much thought into the choice myself. I picked the Mozilla version because I use it on Firefox every day and it seems to work fine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: