It's around 8KB now – so text bigger than 8 thousand characters will return: "414 Request-URI Too Large".
Anyway the document.body.innerText contains all things on the site, including links, menus, buttons etc just 1 per newline. LLM will only recognise if it previously scanned the same website and it did not change much since the last training set. Some arbitrary websites it will not recognise this way and start hallucinating one because innerText removes all the structure from it.