Here's a version of that which should work for any page, whether or not they all...

thih9 · 2024-12-31T06:59:20 1735628360

I’m sure Openai likes this use case, a neat way to access data where they are otherwise blocked.

Personally I’d worry about using this accidentally and with some sensitive data (eg logins).

I do like the idea though, I’d use this with a local model.

hu3 · 2024-12-31T05:52:30 1735624350

Nice! Thank you!

I just wonder if browsers will limit the amount of characters in URLs.

If memory serves me, there was a limit. But it might be high enough to work fro most pages.

gloosx · 2024-12-31T06:21:54 1735626114

It's around 8KB now – so text bigger than 8 thousand characters will return: "414 Request-URI Too Large".

Anyway the document.body.innerText contains all things on the site, including links, menus, buttons etc just 1 per newline. LLM will only recognise if it previously scanned the same website and it did not change much since the last training set. Some arbitrary websites it will not recognise this way and start hallucinating one because innerText removes all the structure from it.

a57721 · 2024-12-31T06:36:43 1735627003

Modern browsers are not an issue here, e.g. chromium allows 2MB; the issue is with web server's limits.

hu3 · 2024-12-31T08:21:07 1735633267

Indeed, I'm getting Cloudflare error "414 Request-URI Too Large" for this HN post which isn't large.

But the URL bar was not the problem.