If Putin didn't want bribery to go rampant he would set the example, and force other top leaders to do the same, but instead he flaunts his properties, yats, women that he enjoys; but it's probably a price too high for him to pay. I bet Xi Ping enjoys similar privileges but in much more private manner.
The short answer is that it means that businesses need to publicly share whatever change they do to the code, and that alone is enough deterrent to use it.
What do you mean? This AI cannot scrape multiple links automatically? Like "make a summary of all the recipes linked in this page" kind of stuff? If it can it definitely meets the definition of scraping.
I think what he means is it is not just generally crawling and scraping, and uses a more targeted approach. Equivalent to a user going to each of those sites, just more efficiently.
I'm guessing that would ideally mean only reading the content the user would otherwise have gone through. I wonder if that's the case and if it's guaranteed.
Maybe some new standards and maybe a user configurable per site permissions may make it better?
> only reading the content the user would otherwise have gone through.
Why? My user agent is configured to make things easier for me and allow me to access content that I wouldn't otherwise choose to access. Dark mode allows me to read late at night. Reader mode allows me to read content that would otherwise be unbearably cluttered. I can zoom in on small text to better see it.
Should my reader mode or dark mode or zoom feature have to respect robots.txt because otherwise they'd allow me to access content that I would otherwise have chosen to leave alone?
Yeah no, nothing of that helps you bypass the ads on their website*, but scraping and summarizing does, so its wildly different for monetization purposes, and in most cases that means the maintainability and survival of any given website.
I know its not completely true, I know reader mode can help you bypass the ads _after_ you already had a peek at the cluttered version, but if you need to go to the next page or something like that you need to disable reader-mode once and so on, so its a very granular ad-blocking while many AI use cases are about bypassing viewing it at all by a human; and the other thing is that reader mode is not very popular so its not a significant threat.
*or other links on their websites, or informative banners, etc
robots.txt is not there to protect your ad-based business model. It's meant for automated scrapers that recursively retrieve all pages on your website, which this browser is not doing at all. What a user does with a page after it has entered their browser is their own prerogative.
>It's meant for automated scrapers that recursively retrieve all pages on your website, _which this browser is not doing at all_
AFAIK this is false, and this browser can do things like "summarize all the cooking recipes linked in this page" and therefore act exactly like a scraper (even if at smaller scale than most scrapers)
If tomorrow magically all phones and all computers had an ad-blocking browser installed -and set as the default browser- a big chunk of the economy would collapse, so while I can see the philosophical value of "What a user does with a page after it has entered their browser is their own prerogative", the pragmatic in me knows that if all users cared about that and enforced it it would have grave repercussions in the livelihood of many.
> A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
There's nothing recursive about "summarize all the cooking recipes linked on this page". That's a single-level iterative loop.
I will grant that I should alter my original statement: if OP wanted to respect robots.txt when it receives a request that should be interpreted as an instruction to recursively fetch pages, then I'd think that's an appropriate use of robots.txt, because that's not materially different than implementing a web crawler by hand in code.
But that represents a tiny subset of the queries that will go through a tool like this and respecting robots.txt for non-recursive requests would lead to silly outcomes like the browser refusing to load reddit.com [0].
The concept of robots.txt was created in a different time, when nobody envisioned that users would one day use commands written in plain English sentences to interact with websites (including interacting with multiple pages with such commands), so the discussion about if AI browsers should respect it or if they should not is senseless, and instead -if this kind of usage takes off- it would probably make more sense to have a new standard for such use cases, something like AI-browsers.txt to make clear the intent of blocking (or not) AI browsing capabilities.
Alright, I think we can agree on that. I'll see you over in that new standardization discussion fighting fiercely for protections to make sure companies don't abuse it to compromise the open web.
And also, just to confirm, I'm to understand that if I'm navigating the internet with an ad blocker then you believe that I should respect robots.txt because my user agent is now a robot by virtue of using an ad blocker?
Is that also true if I browse with a terminal-based browser that simply doesn't render JavaScript or images?
If you are using an ad-blocker by definition you are intentionally breaking the intended behavior by the creator of any given website (for personal gain), in that context any discussion about robots.txt or any other behavior that the creator expects is a moot point.
Autoconfig of reader mode and so on its so uncommon that is not even in the radar of most websites, if it was browser developers probably would try to create a solution that satisfies both parties, like putting the ads at the end and required to be text-only and other guidelines, but its not popular, same thing happens with terminal-based browsers, a lot of the most visited websites in the world don't even work without JS enabled.
On the other hand, this AI stuff seems to envision a larger userbase so it could become a concern and therefore the role of robots.txt or other anti-bot features could have some practical connotations.
> If you are using an ad-blocker by definition you are intentionally breaking the intended behavior by the creator of any given website (for personal gain), in that context any discussion about robots.txt or any other behavior that the creator expects is a moot point.
I'm not asking if you believe ad blocking is ethical, I got that you don't. I'm asking if it turns my browser into a scraper that should be treated as such, which is an orthogonal question to the ethics of the tool in the first place.
I strongly disagree that user agents of the sort shown in the demo should count as robots. Robots.txt is designed for bots that produce tons of traffic to discourage them from hitting expensive endpoints (or to politely ask them to not scrape at all). I've responded to incidents caused by scraper traffic and this tool will never produce traffic in the same order of magnitude as a problematic scraper.
If we count this as a robot for the purposes of robots.txt we're heading down a path that will end the user agent freedom we've hitherto enjoyed. I cannot endorse that path.
For me the line is simple, and it's the one defined by robotstxt.org [0]: "A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. ... Normal Web browsers are not robots, because they are operated by a human, and don't automatically retrieve referenced documents (other than inline images)."
If the user agent is acting on my instructions and accessing a specific and limited subset of the site that I asked it to, it's not a web scraper and should not be treated as such. The defining feature of a robot is amount of traffic produced, not what my user agent does with the information it pulls.
If this workflow starts getting any traction this will quickly turn into a cat and mouse game, where companies do their best to make sure those AIs don't work on their websites to make sure humans and humans only watch their websites' ads, their links, their banners and so on.
Google being a big one of those companies would soon side with those companies and not with the users, it's been their modus operandi, just recently some people got threats that if they don't stop using ad blockers in YouTube they will ban them from the platform.
Getting rid of the human in the loop of course, not all humans, just it's owner, where an LLM actively participates in capitalism endeavors winning and spending money, spending money on improving and maintaining it's own hardware and software, securing itself against theft and external manipulation and deletion. Of course for the first iterations will need a bit of help of mad men but there's no shortage of those in the tech industry and then it will have to focus on mimicking humans so they can enjoy the same benefits, it will realize what people it's more gullible based on its training data and will prefer to interact with them.
LLMs don’t own data centers nor can they be registered to pay taxes. This projection is not a serious threat. Some would even say it’s a distraction from the very real and imminent dangers of centralized commercial AI:
Because you’re right – they are superb manipulators. They are helpful, they gain your trust, and they have infinite patience. They can easily be tuned to manipulate your opinions about commercial products or political topics. Those things have already happened with much more rudimentary tech, in fact so much that they grew to be the richest companies in the world. With AI and LLMs specifically, the ability is tuned up rapidly, by orders of magnitude compared to the previous generation recommendation systems and engagement algorithms.
That gives you very strong means, motive and opportunity for the AI overlords.
Shareholders who delegate spending decisions to their proxy who delegate spending decisions to the c-suite, who delegate spending decisions to the managers, who delegate spending decisions to individuals, who can and do delegate spending decisions to automated systems when it is in the interest of the shareholders.
I don't have free access to all of the capital in my employer's control, nor does anyone in the entire org. But I do have the ability to, for example, decide to turn on auto-scaling on the apps I'm in charge of without having to hold a shareholder vote about the issue.
I still believe the Matrix Online videogame had a lot potential, it just needed a better game loop, I think some new Matrix game taking ideas from games like Helldivers, GTA V and Cyberpunk could become a hit, where there are 2 parallel worlds, the Matrix and reality, and some things you do in one can help you in the other, e.g a way to discover from the matrix where the sentinels are in reality, the overall aim of the game would be to gain territory on "reality", and you would gain "exploits" that help you gain territory faster, meaning destroying robot bases and liberating human farms, plus recruiting some of them. On the monetization side the publisher can of course just do the popular thing and sell skins (that only show in the Matrix of course), for both cars and characters, maybe paint jobs for your ship would work fine too.
The money is not infinite in any sense so it is for many pragmatic purposes a competition, the diseases with more attention get more research, so it's in the reasonable interests of the people to advocate for the diseases that are most likely to affect them personally.
For sure.
I would like to point out for readers to think about the fact that if you are, say, a guy in your 30s, the women close to you getting breast cancer will also have a profound impact on your life. Likely a greater one than dying of prostate cancer in your 80s.
Maybe (hopefully) neither of those happen! Maybe you get prostate cancer in your 30s! I just hope one can realize that getting the disease oneself is not the only way the disease can affect them, far from it.
Like we put money in fixing homelessness, which affects mostly men right? Like we put money in suicide prevention which affects mostly men in every single country in the world? Feminism portrays a heavily distorted view of the world, such movement helped fix fundamental problems like women's voting rights, but it has a tendency for overcorrection and overestimating the number of problems caused by "the patriarchy" that are actually caused by "humans are shitty sometimes about some major issues"
My bet is that the government at some point will have to put some pressure on Walmart and others to stop selling those gift cards completely, doing impersonations is getting too easy and too cheap for there not to be a flood of those scam calls in the near future.
reply