Hacker News new | past | comments | ask | show | jobs | submit login

Yes but it's relatively shit

The Vision API can't even read vertical Japanese text




Fair enough. There are some new OCR APIs in the next macOS release. I wonder if the model has been improved.


They're just a new Swift-only interface to the same underlying behaviors, no apparent improvement. I was hoping for more given the visionOS launch but alas

What I'm trying now is combining ML Kit v2 with Live Text - Apple's for the accurate paragraphs of text, and then custom indexing that against the ML Kit v2 output to add bounding rects and guessing corrections for missing/misidentified parts from ML Kit (using it only for bounding rects and expecting it will make mistakes on the text recognition)

I also investigated private APIs for extracting rects from Live Text. It looks possible, the APIs are there (it has methods or properties which give bounding rects as is obviously required for Live Text functionality), but I can't wrap my head around accessing them yet.


I feel like text detection is much better covered by the various ML models discussed elsewhere in the comments. Maybe you can combine those with Live Text. I found Tesseract pretty ok for text detection as well but I don’t know if any of the models are good for vertical text.


ML Kit v2 works with vertical text better than Tessy




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: