As a blind person, I dearly miss the textual Internet as it was.
The original Internet -- the textual one -- was perfectly accessible.
The extra markup added to HTML to help screen readers was added because
the web stack had gotten too complex, not because it was too simple.
The lang attribute added to the span tag in HTML really helps with ensuring that a screenreader outputs quoted foreign-language text in that language, instead of just pronouncing gibberish. And this tag and attribute go pretty far back into the 1990s, it is not part of the modern-web-gone-mad. So, similar markup is one of those little things that Gemini should have thought about from the very beginning, and it would not have made their stack appreciably more complex.
Foreign language text could potentially be auto-detected. You could either use some sort of an NN based approach, or use an algorithm like the one that browsers use for detecting the text encoding of a page where that's not given.