What evidence do you have to say that the accuracy right now is bad? UI recognition is generally much easier than photographic recognition. Additionally, per the original point re: boling the ocean, it is more likely that a person has a CV-based screen reader than every website they are visiting having ARIA tags.