Hacker News new | past | comments | ask | show | jobs | submit login

I’ve used pikepdf[1] for text processing before. To use it for the task you outline, you’ll probably need to thoroughly investigate how bitmaps can be represented in PDFs. (Or maybe not, if you only need to deal with a known finite set of PDFs or PDF producers.)

[1] https://pikepdf.readthedocs.io/en/latest/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: