Hacker News new | past | comments | ask | show | jobs | submit | berkes's comments login

I doubt any "anti-scraper" system will actually work.

But if one is found, it will pave the way for a very dangerous counter-attack: Browser vendors with need for data (i.e. Google) simply using the vast fleet of installed browsers to do this scraping for them. Chrome, Safari, Edge, sending the pages you visit to their data-centers.


This feels like it already was half happening anyway so it isn't to big of a leap.

I also think this is the endgame of things like Recall in windows. Steal the training data right off your PC, no need to wait for the sucker to upload it to the web first.


This is why we need https://ladybird.org/

I've done that as well. The PoC worked, but the statelessness did prove a hurdle.

It enforces a pattern in which a client must do the PoW every request.

Other difficulties, uncoverd in our PoC were:

Not all clients are equal: this punishes an old mobile phone or raspberry-pi much more than a client that runs on a beefy server with GPUs or clients that run on compromised hardware. - I.e. real users are likely punished, while illegitimate users often punished the least.

Not all endpoints are equal: We experimented with higher difficulties for e.g. POST/PUT/PATCH/DELETE over GET. and with different difficulties for different endpoints: attempting to match how expensive a call would be for us. That requires back-and-forth to exchange difficulties.

It discourages proper HATEOAS or REST, where a client browses through the API by following links and encourages calls that "just include as much as possible in one query". Deminishing our ability to cache, to be flexible and to leverage good HTTP practices.


That's a perfect tool for monopolists to widen their moat even more.

In line with how email is technically still federated and distributed, but practically oligopolized by a handfull of big-tech, through "fighting spam".


That's not true for the vast amount of creative-commons, open-source and other permissive licenced content.

(Aside: the licenses and distribution advocated by many of the same demography (information wants to be free -folks, jstor protestors, GPL-zealots) that now opposes LLMs using that content. )


> GPL-zealots

I'm sure GPL zealots would be happier about this situation if LLM vendors abided by the spirit of the license by releasing their models under GPL after ingesting GPL data, but we all know that isn't happening.


Sounds like something IPFS could be nice solution for.

> Why would you threaten your users with that?

Your users - we, browsing the web - are already threatened with this. Adding a PoW changes nothing here.

My browser already has several layers of protection in place. My browser even allows me to improve this protection with addons (ublock etc) and my OSes add even more protection to this. This is enough to allow PoW-thats-legit but block malicious code.


Not safety-conscious users who disable javascript.

Those aren't threatened by PoW or malicious versions thereof either.

Nor Chrome(ium)/Linux.

WebGPU is experimental in Firefox all platforms, but especially on Linux. Chrome on linux should have it, but I've not gotten it to work - might be chromium, might be a flag, or something else.

Here's more info on the status of WebGPU: https://github.com/gpuweb/gpuweb/wiki/Implementation-Status


Wasn't this the idea of the JVM?

Java bytecode was originally never intended to be used with anything other than Java - unlike WASM it's very much designed to describe programs using virtual dispatch and automatic memory management. Sun eventually added stuff like invokedynamic to make it easier to implement dynamic languages (at the time, stuff like Ruby and Python), but it was always a bit of round peg in square hole.

By comparison, WASM is really more like traditional assembly, only running inside a sandbox.


Just like CLR bytecode, IBM i TIMI bytecode and many others since 1958.

For some reason when people advocate for WASM outside of the browser, they only remember of the JVM.


I think so, but that was the 90s where we needed a lot more hindsight to get it right. Plus that was mostly just Sun, right? WASM is backed by all browsers and it looks like MS might be looking at bridging it with its own kernel or something?

I don't know. The integration of Java applets was way smoother than WASM.

Security wise, perhaps a different story, though let's wait until WASM is in wide use with filesystem access and bugs start to appear.


I understand why, but still lament that java applets where dropped like a hot potato, rather than solving the (fundamental) issues.

Back then, I learned Java, just to have fancy menus, quirky gimmicks and such. Until flash came along, nothing could do this. Where Java was rather open, free/libre, flash was proprietary and even patented. A big step back. And it took decades before JavaScript reached parity in possibilities to create such interactive, multimedia experiences in a cross-browserish way.

I can only imagine how much further along something like videoconferencing, realtime collaboration or gaming on the web would've been if this Java applet tech had been ever improving since inception.

(edit: I'm all for semantic, accessible ,clean HTML/CSS/JS in my web apps. But there's lots of use cases for gimmicks, fancy visuals, immersive experiences etc. and no, that's not the hotel-reservation-form or the hackers-forum. But art. Or fun. Or ?)


> that was the 90s

In the meantime the CLR happened too.

And - to an extent - LLVM IR.


And of course the ill-fated Parrot VM associated with the Perl 6 project.

I think that was more of a language-oriented effort rather than runtime/abi oriented effort.

Parrot was intended to be a universal VM. It wasn’t just for Perl.

https://www.slideshare.net/slideshow/the-parrot-vm/2126925


Sure, I just think that's a very odd way to characterize the project. Basically anything can be universal vm if you put enough effort to reimplementing the languages. Much of what sets Parrot aside is its support for frontend tooling.

“The Parrot VM aims to be a universal virtual machine for dynamic languages…”

That’s how the people working on the project characterized it.


I certainly think the humor in parrot/rakudo (and why they come up today still) is how little of their own self image the proponents could perceive. The absolute irony of thinking that perl's strength was due to familiarity with text-manipulation rather than the cultural mass....

It’s not a bad idea. Lot of the same people who worked on JVM were around while the asm - wasm ideas emerged

Isn't this typically solved with polyfills in the JavaScript world?


I regularly add Symbol based features to JS libraries I'm using (named methods are riskier, of course)

    import { SomeStreamClass as SomeStreamClass_ } from "some/library"
    export class SomeStreamClass extends SomeStreamClass_ {
      [someSymbol] (...) { ... }
      ...
    }
I have not blown my foot off yet with this approach but, uh, no warranty, express or implied.

It's been working excellently for me so far though.


Much nicer than just adding your symbol method to the original class. :p


13 days late but for posterity:

Yes. Not Wanting To Do That was the motivating factor for coming up with this approach :D


I guess it could be improved with a simple check if SomeStreamClass_ already has someSymbol and then raise an exception, log a warning or some such.

8 days late but for posterity:

So far I've only ever been using a private symbol that only exists within the codebase in question (and is then exported to other parts of said codebase as required).

If I ever decide to generalise the approach a bit, I'll hopefully remember to do precisely what you describe.

Possibly with the addition of providing an "I am overriding this deliberately" flag that blows up if it doesn't already have said symbol.

But for the moment, the maximally dumbass approach in my original post is DTRT for me so far.


I've used Codeberg for some projects and while their work and services are impressive and their progress steady and good, it's really not a proper alternative to Github for many use-cases.

"It depends", as always, but codeberg lacks features (that your use-case may not need, or may require), uptime/performance (that may be crucial or inconsequential to your use-case), familiarity (that may deter devs), integration (that may be time-consuming to build yourself or be unnessecary for your case) etc etc.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: