On the other side, I've also noticed it appears to be aggressively pruning its index in the past few years, so the fact that it's crawled your site doesn't mean it's necessarily searchable either.
Another "bug" that seems to manifest quite often: if I search for a specific phrase or unique word on a page that I found in a SERP, so I know it's crawled that page, it often doesn't return that page either.
Add to that the automatic CAPTCHA-hellban you get if you use "site:" in anything more than a tiny amount (and the one you still get if you search "too much"), and I realise that there's increasingly huge amounts of information out there on sites that Google may have crawled before and knows about, but doesn't want to show me for some reason. I remember it used to be much easier to find information about obscure topics even if it meant wading through dozens of pages of SEO spam; now it's nearly impossible for anything but the most vapid of queries.
Another bug I'm noticing lately is it'll flat out ignore things sometimes, even if you put a term in quotes or try to exclude it with a -leadingdash. About 30% of the time if I use those operators, they'll have no effect on the results. I don't understand why they'd make things worse on purpose, but I don't know how it could be just a "mistake" no one noticed.
Search engines in general have realized that it's more profitable to show you irrelevant results than to show you nothing. Furthermore, they've realized it's more profitable to show you irrelevant results laden with their ads than show you highly relevant results from ad-free sites.
This is precisely what happened. When google merged with doubleclick.net the new company should have been named doubleclick.net and not google. The old google ceased to exist at that point and was swallowed by an advertising company.
I strongly agree with this bill hicks bit on advertising:
I know there were rumblings in the late 00's and early 10's about how McDonnell Douglas culture and executives were ruining Boeing.
But some people take a step farther back than this and blame Congress for the 737 MAX. They basically forced the merger, and unhappy weddings make for unhappy homes.
I've seen plenty of mergers where there's a weird brain transplant and flippozambo! the acquired company's leadership is now in charge of the buying company. The fish, as they say, trots from the head.
I totally agree with you. Ads have influenced everything they’ve done since. It’s like a brilliant, talented individual who has been addicted to heroin for a decade.
Yeah but google also became wildly successful to the point that they blow money on ventures with no real businesses plan and give up two years later when they can't turn a profit. They effectively have a blank check at all times. They're more like a businessman who's addicted to making money at the expense of all their personal relationships.
Regarding unhinged ideas, doubleclick is quite old, but is it old enough that opening a hyperlink would've typically required a double click at the time? Or is the metaphor here that their ads are so amazing people are double-clicking them in ecstasy?
Double-click, as others have said was never something you did with hyperlinks, even before the web.
Double-clicks were used with icons on the desktop because you could do more with an icon than just open it. You could move it, copy it, etc. Double-click was a convention for a shortcut to open the reference of the icon. A single-click would have not allowed those other actions.
Because of this, double-click became business speak for going to the next level of detail, digging into, etc.
The idea behind this name for ads was: this company makes ads relevant and compelling, so users drill into them and find whatever you want to advertise.
For what it's worth, because of the affordance you mention, even though users didn't have to, they consistently double-clicked banner ads, and most things they wanted to activate, even after they learned they only had to single click the blue underlined things.
You can but that means hovering over files for a second will select them (and clear your previous selection). Other systems (e.g. KDE) manage single click to open without that annoyance.
> A single-click would have not allowed those other actions.
Yes it would. And did. Windows chose double click to open but other systems managed with a single click while still allowing you to drag around icons and files.
I find Amazon really irritating for that. I do a search for a very specific thing, and a ton of results always come back, often having nothing to do with my search request. And sponsored results both at the top and scattered through the results.
Amazon is gotten so bad that unless I know an exact part number or model then I don't bother. I'll go somewhere else for any research and only come back to Amazon if I want to price shop what I found.
Even with an exact part number, it will often push related items first. I was searching for a specific thermal printer, literally using the PN (something like C18647585), and it still decided to show me "sponsored" and related thermal printers first. So it somehow knew that part number as a keyword for thermal printers, but just didn't want to show me the one result that actually would be helpful (it was a third party seller, so maybe that penalizes the result?)
There's Worldcat, though its site revise last year made it useless for me.
I'm finding Open Library (part of the Internet Archive) is increasingly useful for book search.
You might also have success with a major library (e.g., British Library, Library of Congress, major US city libraries (NYC, Boston, LA, San Francisco, Chicago, etc.), and some academic libraries. Watch that these aren't in fact backed by Worldcat though. (Many local library systems are.)
Goodreads is tolerable, but mainly as a data source. The product itself has been in maintenance mode for... a decade?... or basically since the Amazon acquisition.
They rolled out a completely new design semi-recently for... half of the pages... and left the other half on the 10+ year old styles.
It just feels like Amazon is happy to take advantage of its dominant position with Goodreads having a more complete catalog than any of the other more open offerings. And yet, they seem to invest no effort in modernizing or improving the site, making it more performant, etc. The moderation tools kinda suck too — doing super common things like merging (incorrect) duplicate listings is a PITA.
Also the app exhibits blatant conflicts of interest like prioritizing buying new books from Amazon over, e.g., digital library loans, with no option for users to configure that.
Amazon owns Goodreads. It's not independent. It's also not mentioned on the site afaik. (They also own IMDb and a bunch of other internet companies that aren't Amazon-branded). If you want something independent, try storygraph or librarything.
This is particularly bad if you search for a type of thing, e.g. "mechanical keyboard". Many of its top suggestions will be for nonmechanical keyboards and that won't be obvious without reading their descriptions carefully.
"keyboard non mechanical like cherry switch membrane touch rgb light clicky gaming blogging ergonomic xiaomi redmi arduino android ios windows laptop desktop computer tablet phone", and if you sort by price, the first one is $0.99.
Then you have 5 different colors of plasticky $15 keyboards and a usb card reader for $0.99 to choose.
What’s infuriating is how this lying has become normalized in “good” brands. For instance, try to buy a 60” TV. I do not think you can find one. They are all 59.5” and sold as ‘60” class’.
Stop buying from Amazon. I haven’t bought anything from them in years. There is nothing that Amazon offers for sale that you can’t find somewhere else, aside from maybe entertainment content that they produce.
Don’t reward bad behavior or they’ll keep doing it.
What you can find on amazon that you can't find elsewhere (at least here):
- All your regular non-food needs in one cart.
- No-hassle refunds if there is a problem with the order.
The first one is just convenience and I could do without. The second one is where most other stores fail. Or at least enough of them that I don't want to risk having to phone their hotline or pay for shipping back because they or their delivery contractor fucked up - or is fraudulently claiming they tried to deliver when they made no such attempt at the specified address and instead want you to waste your time picking up the package at a random location accross town.
why would you expect to get a 4chan page? none of that data is persistent? iirc google relies on links to the page, so that is impossible, plus the content rotates constantly when they drop off the last page
Generating money for google is not the only metric that matters for the users. The incentives are perverse from the perspective of everyone other than google executives and investors with significant google holdings.
I notice this happening when the actual query would have returned 0 results, Google ""helpfully"" will modify your query (such as dropping quotes) to generate more results.
This is super annoying because it doesn't appear to inform you of this anywhere in the UI, until you click through to page 2 and see what it modified your query to be.
Over the years, I've found the frequency of "0 result" queries has gone way up. Subjective anecdata from me, but it's a pretty big difference. There must be some large areas of their index that have been dropped over time.
From what Google's hinted at and probably your own experiences (which I reckon are like mine), it's pretty clear that most folks aren't great at Google searches. This might be why Google has leaned on AI to "guess" the best results. They figure their AI can predict what you want better than you are able to specify via your search query.
Which has the really annoying side effect that now you actually need to produce a worse query to get decent results.
A few years back I used to scoff at people who wrote searches in the form of literal questions instead of boiling things down to key terms, e.g. the "how do i..., what is the largest..." type searches.
Nowadays I not only need to write stupid searches like this to get better results but quite literally find that my brain has adapted the pattern and my past skill at crafting salient key term, operator driven queries is eroding.
I've noticed the same. And sometimes you'll get search results showing 10+ pages, but if you actually follow through them, the results die by the second page. Google also omits many domains from search results now.
Sites like Twitter and Instagram also frequently completely change the search term now to something else for certain queries. This practice is anti-competitive in the highest order. The very foundation of having a text search is to have an exact query match to begin with... The alternate spelling item should only be a suggestion in results at the most, but they've flipped this now, and that's outright deceptive.
Might be desktop exclusive? For me on mobile (testing another random phrase as now yours hits this post), I don't see that text, or any other indication the query has been modified.
I'm used to that. But this is worse, this is when the search I enter would give the ideal results because I eventually contort the search parameters that I find some results.
Funny that people call these "bugs" as if anything related to google search happens on accident.
They don't need to waste the eng resources or infrastructure on rock solid search anymore, they own the market and got all the users into their funnel of products, most locked in for life.
Search results still show sponsored listings, they still have all the users, and all the profit, and a lot less of the profit sucking operational costs it took to be good at what made them a household name, search
They aren’t inserting junk, they just don’t do anything to rank quality results above the junk anymore. The junk was always there, on page 2 and beyond, and who would ever need to hit those pages. Nowadays I’ll be 15 pages deep and so far off base of the search term that I could write a better index with curl and regex.
The issue isn’t that they are watering down the cream in the milk, it’s that the cream isn’t part of the milk at all now.
There was a Google search engineer on Reddit who claimed the opposite personally, Google is going down the trash but the alternatives aren't any better. Of course I can find it now, thanks Google.
I wish there was a search engine that ran like mid 2000s google but with a social media component so you can down vote SEO spammer blogs into oblivion.
Unfortunately, content farms can push new websites and blogs faster than you could ever downvote them. LLM are going to make that task increasingly easier. I've no idea how we're ever going to be able to search anything anymore using classic search engines. We either go back to website directories, or forward to AI generated content..
Perhaps that is one additional layer of friction that will make human moderation / social voting feasible. The fire hose of AI trash content will come too rapidly for it to work at layer 1 (all content), but if the barrier to entry is a financial transaction to take over placement in a human-curated webring or directory it becomes easier to moderate / vote away the trash.
Add a trust metric and chains of provenance. Bad ring link -> bad trust percolating up that chain. Little trust, your site isn't always shown as part of the ring. Too much loss of trust, you're out.
(Ultimately, this is a bad facsimile of human group behavior - all the way up to shunning people who deeply violate group norms. And I don't think it'll scale super-well. )
Except there's no provenance or root of trust. There is (IIUC) no back-propagation of a penalty if sites violate trust, just an overall observational measure.
And I'd still say pagerank did work really well in an Internet where there was overwhelmingly trust. But in a world where default-trust is a bad stance, I believe there needs to be an equivalent of what "You can trust X" does in small in-person groups. (Or, alternatively "Sheesh, X went off and just destroyed all trust")
I do think it'll need to be more than a single metric, too. Trust is multidimensional by topic(E.g. "I trust the NYTs data science folks, I have zero trust for the OpEds"), and it is somewhat personal. (E.g. I might have experienced X lying to me, while they've been 100% honest to you - maybe in/outgroup, maybe political alignment, maybe differing beliefs, etc. Ultimately, what we call trust in an indirect situation is "most of my directly trusted folk vouch for that person)
Keyservers. You decide which keyservers to register with and to trust for verifying others. Browsers would handle en-decryption automatically and allow you to flag, filter, or finger (in the Unix sense).
I used DDG for a while, but DDG's quality fell precipitously a few years ago (similar issues where it ignores quotes and won't find pages even if you search for the title string exactly, etc) and I eventually came back to Google which has also been increasingly frustrating.
> I wish there was a search engine that ran like mid 2000s google but with a social media component so you can down vote SEO spammer blogs into oblivion.
There's no way this won't get abused, but the SEO stuff is out of control. Not even spammer blogs, but if you have a quick question like "how do I check tire pressure" you will only get articles that start with a treatise on the entire history of car tires and the answer is deeply buried somewhere in the article. My guess is that Google sees that we're on the page for a longer time than we would spend on pages that just return the answer, and they assume that "more time on page" == "better content" or something.
DDG has become ridiculous. They seem to be merging "local", geoIP based results no matter what country I select on the region list (or I disable it). Very often completely unrelated stuff (but local) appears on the 5th or 6th result, midway the first page.
Most egregiously I will search for something very rare (e.g. about programming) and DDG will return me results regarding my city's tourist/visitor info. It's as if it just keeps ignoring words from the search prompt that return no results until it runs out of keywords then it's just the geoIP results.
I hate this forced localization so much and its everywhere. The internet used to be a place where you would actually encounter stuff outside your locale.
That is because DuckDuckGo started relying almost entirely on Bing for their regular search results after first Yahoo gave up maintaining its own index then Yandex became part of a natio non grata leaving them to choose between partnering with Bing and partnering with Google or creating their own index https://help.duckduckgo.com/duckduckgo-help-pages/results/so...
The tire pressure query is exactly the kind of thing that AI should be able to handle easily, though. At which point google has an incentive to sort their competitiveness out.
Love kagi. The first time I got the "your payment was successful" notification I felt like I'd never get that much value out of it. But now, a few months later, I feel like I could never go back.
No? At least not like you are implying. Kagi queries multiple data sources and synthesizes results. This means Google’s failure to index does not impact Kagi in the same way as it would DDG (with Bing).
Though a subscriber myself, Kagi doesn't really add results, does it?
It merely weeds out the trash for you. So you can get to the bottom of search results.
Looks promising, though I noticed that it doesn't encode queries properly when searching. For example, if you go to the homepage and search for "../robots.txt", you'll be redirected to the site's own robots.txt file
What I want is a "serious mode" that makes it favor primary sources, peer reviewed papers, and raw data. When I search for economic data, I don't want a million news articles referencing pieces of it. I want the raw data release. When I search for some video going viral, I don't want a million videos of journalists talking and showing clips. I want the full raw video.
Beautifully said! As a thinker of philosophy, I have come to understand that our clip-society is based by design. People can express power over others if they tell you a construction and then show a clip to support it. They really don't want you to see the source/what it is/the truth. They want you want you see what they show you. This problem is accelerating in western societies and it is a fundamental problem of human nature. Journalism is the healthy expression and what we see in today's media is the sickly end.
> I wish there was a search engine that ran like mid 2000s google but with a social media component so you can down vote SEO spammer blogs into oblivion.
This is sort of what I've been trying to do with Marginalia Search, except I don't really believe a voting system would work. It's far too easy to manipulate. Been playing with the thought of having something like an adblock-list style system where domain shitlists can be collaborated on and shared without being authoritative for the entire search engine.
My search engine is still pretty rough around the edges and limited but I think it works well enough to demonstrate the idea has some merit.
> Been playing with the thought of having something like an adblock-list style system where domain shitlists can be collaborated on and shared without being authoritative for the entire search engine.
Even just personal shitlists would be golden and make just about everyone happy.
Something I've wanted (which probably exists as an extension in Chrome?) for Google searches is a simple blacklist. Just a little button and confirmation next to a result, telling it to never show this blog-spam-ad-laden-SEO-mess of a page to me ever again. Maybe it's an uphill battle, but for some niche topics (like certain games) there are some sites I keep having to scroll past and sometimes accidentally click that are written in SEO-speak and say a lot without saying anything at all.
Remember Guestbooks? you'd visit a website, and volunteer your name and which country you were from and leave comments. And it wouldn't be a cesspool of spam and porn and XSS attacks? How quaint!
Oh gosh, yes! And reading the guestbook was always so fun. An elderly friend of mine passed away in 2018, and in doing a (google) search of him, I found guestbooks he'd signed 20 years ago.
You would look for a thing and the first five pages were random mailing list discussion archives discussing how the thing was 5 years before... Altavista was impressive, but there is a reason why it went away.
> I wish there was a search engine that ran like mid 2000s google but with a social media component so you can down vote SEO spammer blogs into oblivion.
I want this too, but I think an often understated aspect of this issue is that by this point Google has absolutely trashed the web of that era. In these threads people will say "the content you want isn't out there, it's all on social media now" -- and they're largely right, but I think Google is the party most responsible for mutilating the web to the state it is in now, and users fled to social media partly because it seemed like a safe haven.
What we need is a concentrated effort to rebuild the web. Take the best parts of what we've learned and combine with the best parts of what we've left behind and try to build something better, for humans, not for advertisers and hyper-capitalists.
That will take time, energy, and people who remember what we lost and believe we can build something better. A better search engine alone is not enough.
Largely right, but actually a lot of that stuff is still out there. The personal and hobby pages, forums, blogs, etc.
Google just doesn't know that they exist anymore, or rather doesn't want us to know, because those sites are not commercial enough or big enough.
Almost without fail, no matter what you search for, it tries its best to turn it into a search for a product or service. And those content oriented websites don't fit that, so it just pretends they don't exist.
The web changed when every kinda slimy business bro realized they could monetize gaming search results. No matter what your fantasy web looks like, be assured, people will game it to the point it's not what you intended.
If I take that viewpoint on everything I might as well live as a recluse in the woods and avoid people altogether. I have to believe that there are enough of us are out there that genuinely want to build better things for people.
The web, just like the real world isn’t static. Becoming and staying intellectually, emotionally and physically mobile may be the only long term strategy to avoid ending up in one or the other dystopia, sooner or later.
When rates of change were slower, you might only have to “move” once in your life, but with increasing rates of change in our human experience, staying nimble is arguably of ever increasing importance.
And my point is that there are probably a lot of those motivated people working on the problem today. You make it out as though we've arrived at this state by either lack of effort or competence by Google/Microsoft. My guess is that every time they change the algorithm, the spammers adapt too. That's inevitable and would be just as much of a challenge for your supposed utopia. If you have some secret they don't, there's certainly plenty of money to be made.
I think google does OK with the syntax it still supports for text queries, but if you switch to the images tab it just thows all of that stuff out the window. I would love to be able to search for "cat eating watermelon" or whatever and only get results with cats eating watermelon, ordered by the proximity of that text to the image returned. Hopefully AI is going to do something for that, but the state of the art, as embodied by the biggest player (Google) is shamefully deficient.
Its even stupider than that. There are only two major, publicly available web indexes in the USA, Google's and Bing's. After 24 February 2022, DuckDuckGo ended their partnership with Yandex, and since then they say "we have more traditional links and images in our search results too, which we largely source from Bing" https://help.duckduckgo.com/duckduckgo-help-pages/results/so...
Bing at least license their indexes to partners on a commercial basis, as did Yahoo until they gave up indexing the web. I am sure that the NSA, the Chinese government, the ahrefs website, and other organizations have comprehensive indexes of the web which they don't share in this way.
Be careful! The Google search guys will come on HN and gaslight you about this, claiming that the advanced search functionality works perfectly and it's simply user error.
We know it's not, but expect them to try to tell you you're imagining things.
> On the other side, I've also noticed it appears to be aggressively pruning its index in the past few years, so the fact that it's crawled your site doesn't mean it's necessarily searchable either.
I've noticed this as well. I have a crappy website for my app I need to do better marketing for (not my priority just now), but I've noticed that, for however crap it is, I have received ZERO incoming hits from Google, apart from a couple people that have literally just googled my domain name.
I do not believe for a second there's not a single query done in the 2 months the page has been up, globally, for which my website wasn't a bit relevant. Either that, or the spam problem Google has is much bigger than anyone thinks.
Yet another data point in favour of the Dead Internet theory.
You could try the google search console - it gives you a view on what hits/clicks have come in over time.
edit: Hah. I notice it suggests using it at the top of the page if you use 'site:..." - and I only get 5 results for my site when the console claims to have indexed 10 times that many!
edit2: Also duckduckgo returns more like 15 hits ...
Silly to see people complaining about search results and indexing without backing those claims with data from search console. It’s like devs turn off their brains when it comes to marketing because they don’t like it.
Google's bloody Search Console says I got 16 impressions in 2 months for literal searches of my domain name, and nothing else. Funny seeing people thinking I got those figures by reading tea leaves.
I have all sorts of things that I wrote years ago and I can never find them searching by title unless I put the specific name of the site in the query. I sure can find the slideshare though where some guy from Oracle stole not only my title but much of the content from my blog.
"The dead Internet theory is an online conspiracy theory that asserts that the Internet now consists almost entirely of bot activity and automatically generated content, marginalizing human activity. The date given for this "death" is generally around 2016 or 2017."
What I find funny about that framing is that, regardless of whether or not the theory has merit, a conspiracy theory by definition asserts that there exists two or more people conspiring with the intent to produce the alleged outcome. From what I understand, dead Internet theory alleges no such collusion or intent. I could be wrong but I believe that it merely suggests that the amount of bot-generated activity has come to dwarf human generated content to the point where the Internet is effectively "dead" from the perspective of its original purpose: humans sharing human knowledge.
10 or so years ago I wound up blocking everyone other than Google in my robots.txt because I was sick and tired of webcrawlers from China crawling my site twice a day and never sending me a single referrer. Same with Bing. Back when I was involved with SEO the joke was you could rank #1 for Viagra on Bing and get three hits a month.
At least so far according to Cloudflare bots consist of around 1/4 of all internet traffic. But that could be pretty far off depending on how they get those estimates.
The figure I saw most recently was 42%. Weirdly my brain can remember the number but not where I saw it.
But what I'm curious about, whichever number is true, is whether people mean "malicious bots" when they say this, or just any kind of autonomous agent. And also whether they are counting volume of data or simply network requests.
Because if by "bot" they just mean "autonomous agent making a network request" then honestly I'm surprised the number isn't higher, and I don't think there's anything wrong with it. Every search crawler, every service detector, all the financial bots, every smart device (which is now every device) and a thousand other more or less legitimate uses.
I've got a script for parsing my web logs which removes all the lines which match persistent indexers/bots/scrapers and any obvious automatons. Logs generally shrink to 40-50% of their volume, so I'd at least double CF's estimate.
In this video they rename it from 'theory' to 'prophecy'. As in the internet isn't quite dead yet, but its rot filled bloated body is near its dying breath.
What I fucking hate is writing a query, sometimes even with parts in double quotes to clarify, and google "helpfully" correcting it to something unwanted, and then putting up the damn captcha when I click the link to search exactly what I want.
Another thing I've noticed: Google only indexes what people search. Meaning, sometimes if you search for something obscure and you don't get good results, come back a week later and you'll get much better results because your query is now a part of their indexed search terms.
This, I have noticed some years ago. It seems much like, if the number of returned results doesn't meet a given threshold, some kind of optimizer runs over night on these searches in order to provide a more extensive result set.
Super interesting discovery! I wonder if whatever algorithm Google is using has reached its scalability limit on today's Internet, and it takes some kind of an over-night batch job to do obscure searches usefully. Maybe all Google Search is doing is just a giant cache of slow search results.
This has been some years ago. Notably, I observed this in relation to search suggestions. You could enter a search and get zero results, but a day later or two, you'd get at least a suggested search term (regardless how accurate or meaningless this may have been). So I guessed, these were built up, at least partly, retroactively. With results now happily including these "sympathetically adjusted search terms" without presenting this as an explicit option, I'd guess, this may now apply automatically.
> Add to that the automatic CAPTCHA-hellban you get if you use "site:" in anything more than a tiny amount
Pretty much any advanced operators seem to do it for me, notably "intitle:" and "inurl:". I'd wager that there are a lot of automated searches using these to look for exposed admin interfaces, but I find them extremely useful for filtering out the crap that clogs up results when a ton of news sites all regurgitate the same viral press release or wire article.
Just fyi, the database that is used for the site:domain.com is actually not the same database that they use for live searches.
So you may see a certain number of pages using the site: command but not or less pages may be indexed.
If you want pages indexed, out then in an xml sitemap file, make sure there are internal links to them on your site, and external links from other sites really helps. Third party indexer tools help as well.
Google results have become so bad that I use "site:" for a majority of my searches these days. I have a bunch of Chrome search engine keywords set up so that I can go straight to results on Wikipedia, Economist, Reddit, Stack Overflow, Cppreference, etc.
It's concerning that they're even nerfing site search, which seems like a core feature for a search engine. You could argue that Google isn't really a search engine any more, but rather a general knowledge engine and advertising platform. I hope somebody can build an alternative to Google that does what a search engine is supposed to do, i.e. index the web without all the extra garbage. But maybe SEO has killed that dream at this point.
> there's increasingly huge amounts of information out there on sites that Google may have crawled before and knows about, but doesn't want to show me for some reason
this is some machine learning stuff they are doing, instead of indexing all the specific keywords they are creating vector embeddings and basically summarizing what's on the page and going on similarity to your query rather than specific keywords. Good for casual searches, but extremely annoying for power users
Anecdata, but I can confirm a uniform and long-standing experience that adding colon-based operators to a search query results in a CAPTCHA challenge every single time on a subsequent search, even if the subsequent search is 'vanilla' (i.e., no operators). Has been like this more years now than I can remember. Apparently this kind of 'advanced' usage is indication of bot activity.
I have never had this experience once in... decades? I use operators such as site: frequently. I suggest there's some other property of your environment that's setting captcha off - vpn, shared sketchy ip/network, etc. Bad actors suck.
Same. Use it everyday for majority of my G searches, never once seen a captcha (except when using VPN). OP, are you logged in to google? I am. Wonder if that’s the difference?
welcome to the machine learning future, where anything you do that is a statistical outlier gets you algorithmed by a machine that is incapable of reason but knows when you're different.
As a person who has been a statistical outlier most of my life, I am dreading this. It's bad enough dealing with human impressions and mis-judgment, but now we get it from our computers now, which used to be logical, deterministic havens.
For what it’s worth this never, ever happens to me. These days I only get captcha’d when someone’s laptop on the same network gets owned and is being used to hit google.
Hmm... VPN, big proxy, or some other contributing factor? I use site: all the time, not on chrome, and without being logged in... If I've ever gotten captchas doing so, it wasn't frequently enough to see a pattern. Maybe some property of the site makes a difference that puts your usage and my usage on either side of that fence?
Anecdotal, but this happens to me a lot, and not just with the "site:" operator. Generally using any of the advanced operators seems to set it off. Things like inurl:, intitle:, etc, trigger it also. Not every time, but after a few times. From a normal ISP connection, no VPN, even while logged into Google, etc.
I've never gotten a CAPTCHA-hellban that I know if, but I absolutely get a CAPTCHA when I use "site:" for more than just a couple searches. (It sounds par-for-the-course w/ Google, though...)
Yeah, I get these. The problem is that the Captchas take forever to fill out (like 5 minutes of challenges). But the worse part is that the captchas are asking for wrong answers. It tells you to select scooter and there's no scooter in the photo but it thinks there is. So you just end up stuck in a captcha loop for a long time.
I am not sure why I get them but it might be due to using anti-fingerprinting tools.
I've wondered if it isn't intentionally impossible to solve, because "the algorithm" decided that you're a bot or malicious and they want to spin your cycles endlessly. The affect on me know is I won't even try anymore, I'll just take a different route. That may even reinforcement teach the system that I was a bot that couldn't solve it
I think it's more malicious than that. They know I use privacy tools and can't be tracked -> they can't make money on me -> bully me into not using their service.
It may also be part of their anticompetitive war on other browsers. I get captchas constantly in a new default Firefox profile, but not in a new default chrome profile. Spoofing user agent to recent chrome agent in Firefox makes the captchas happen far less often for me.
This is probably the common thread among all the people reporting this. As an alternate date point, I haven't experienced the captcha from using advanced search queries.
>now it's nearly impossible for anything but the most vapid of queries
I've noticed that myself, looking for very precise content, which I know is out there but failed to bookmark. (Most recently, for amateur astronomy and roleplaying.) The solution to finding niche stuff now seems to be digging through relevant reddit or forum threads, hoping someone posted a link to it.
Very true. Some client websites have had half their keywords gone from position 1-3 to deindexed, then back, then gone, then back, and that's been since February 2023.
I assume it is so that websites don't abuse it to build search boxes for their own sites without showing ads?
E.g. I can build a searchbox on mywebsite.com, and if you type "hamster" I'll just query google for "site:mywebsite.com hamster" and return the results to you. That way, my site can be static but still have a search box, and google has all the work but gets no money.
Startpage does the same thing when I use sitebut with no captcha to bypass the hellban. Sometimes it just shows no results intentionally. Refreshing the page fixes that one
They are probably trying to reduce "misinformation" by removing most of the web from their index. With AI, they could just ask bard, "does this website contain any information that would be considered misinformation?" and then just ban it.
If you want "misinformation," or to just search the web like it's the mid 2000s, you can use http://Yandex.com. They do a pretty good job on controversial queries. Google has gotten so political that they even have this "results are changing rapidly" page they return when there's been some new political hot topic that they haven't gotten the commissars at headquarters to weigh in on yet as to what's going to be the official narrative.[1]
Yandex is also censored but in the other direction. Probably does censor less than Google but enough that you shouldn't rely on it alone for topics that involve Russia. Their index is also limited in general when it comes to non-russian content. But it does return many things that Google would rather you not see so it is invaluable if you want to get the full picture.
Yandex image search also has an infinitely better interface, linking your directly to image sources and not being full of links to sites that want you to sign up before showing anything like Google is. It's still not perfect and IME often groups images to agressively which effectively hides "similar" results.
Nice try FBI. In all seriousness though, has it actually gotten so bad that yandex of all search engines is less censored? Or is it just less censored when it comes to topics controversial to the US (and not russia)? The fact that so much censoring is going on that google has a "hold on while we censor this" page is insane.
I'd never noticed any issues with Google until a few months ago where I was googling an exact phrase that I knew appeared on one site. Google gave me nothing but DuckDuckGo found it.
The site is probably 20 years old and has no SSL, but still... giving me no results is worse then giving me the one correct result.
>and I realise that there's increasingly huge amounts of information out there on sites that Google may have crawled before and knows about, but doesn't want to show me for some reason.
Why provide the best product when you only have to have a product slightly better than your competition. After that everything is profit.
Couple that with a huge portion of new sites seem to be bot generated shit that's copied from other places on the internet it seems Google has given up on the open web.
As long as people don't abandon search, yeah, they do. If they lose their absolute dominance in search, they will automatically have competition on adsense too.
Or maybe Google disagrees with my assessment, but I can't imagine what kind of inside information would make them do that. It looks like a very clear and inescapable reality to me.
You should know that's not how capitalism works. They have to keep making more money per dollar every year or they get punished in the market. They've tapped out on their limits of growth and now actual costs are increasing due to floods of automated crap at levels far beyond what we had in the past.
They have a practical monopoly on web and mobile ads, if they really are stagnating then all they need to do is jack up prices by a fraction of a cent and it's already billions in profit. I'm sure they have no problem increasing revenue over time.
Given how stupidly common ads are, increasing prices and upping scarcity would be a good thing overall anyway.
I've always wondered why we got rid of curated directories and changed to search for almost everything (and yes I do realize that volume of sites is problematic).
Also anything past the first page, will just show you crap on the first page. I use to be a power user of operators like 'site:' but agreed, it results in a captcha every other page sometimes.
> so I know it's crawled that page, it often doesn't return that page either.
This is your misunderstanding. The fact that a thing was in the index does not ensure it will always be there. Things disappear from the web all the time. Serving fresh docs means not only crawling the new stuff but also deleting the unreachable stuff promptly.
No, I literally did a site: search seconds after visiting the site to see if it had any other pages with what I was looking for, and it found zero results --- not even the original page I found.
I recognise this as well. I write for a living. So I'll do lots of searches to cross check stuff. But if you search to quickly, or to 'weirdly', or whatever you'll have to pick out bridges or zebra's or whatever is the current fashion in Captcha.
the best one is "select the photograph containing a crosswalk". How am I supposed to know what a crosswalk looks like in each & every culture on earth?
I mean, as a human, you are expected to use context clues.
You don’t need to know the markings used for crosswalks in every place around the world to know what a crosswalk looks like based on its purpose. There’s only so many ways to create a pedestrian crossing across a street, after all.
If anything, that seems like an extremely appropriate choice for something attempting to restrict access for bots that wouldn’t necessarily be able to act on the same context clues and intuition.
This doesn't really cross cultural boundaries. For example, the skull and crossbones means nothing to Iraqis despite universally being seen as as sign of danger and caution in US
You're asked to point out the designated crossing area for pedestrians across a street. Sure, some places use crosswalk stripes perpendicular to the street, others use squiggles, others use lines on the sides, and some don't use any markings at all, but it should be plainly obvious to anyone, anywhere in the world where the designated area is based on there being some marking, or control devices, or literally people walking in the photo.
This isn't rocket science. Using contextual clues to figure something out is literally one of the most basic human abilities.
I share your frustration but I’ve come to learn that a lot of people don’t process things contextually and have an extremely difficult time with problems or reading that require picking up context clues.
I assume you don't have to answer correctly on the crosswalk question, you just have to answer the way most humans answer the question when asked... but I have nothing to back that up.
I'm not sure. It used to be that you could just select whatever as long as you do it with a mouse (so have human-like cursor movement). But latetly reCaptcha and hCaptcha have both been yelling loudly every time I didn't select one square that had a car or staircase or whatever it makes your look for even if that object is relatively small and easy to miss.
I think this is because the primary purpose of the excersise is AI training though.
If you use advanced search features say 10 times in 10 minutes or whatever (a reasonable amount when refining a search if you ask me), you're quite liable to be elected to have a trial of endurance against the "prove you are a human" feature, having to solve multiple (my record is 16) consecutive "select all images that contain BLAH" tests.
I had to solve thousands of captchas as part of the yahoo groups archiving project. You only have to choose four of the images, whatever the test is, and it's not really precise, so you can make small mistakes and it still will let you pass.
>> CAPTCHA-hellban you get if you use "site:" in anything more than a tiny amount
> Please explain this point.
If Google thinks your searches are unusual, it will force you to answer captchas to see the results. They assume anyone using advanced features must be trying to abuse their service.
If your activity seems automated in some way, Google will give you a captcha and sometimes it'll give you one on every search even after you've completed one captcha. But the reason for this is probably a combination of IP usage (e.g. a VPN IP shared between users), browser anonymity, and how specific you're getting with your search results, and not just the fact that you've done 20 searches today with "site:".
It's the height of irony that automated processing produced the AI chatbots that are vogue today, but if your activity is automated, Google considers it a crime. I say irony but that implies the hypocrisy was surprising.
> On the other side, I've also noticed it appears to be aggressively pruning its index in the past few years
In terms of breadth and depth, the quality of google search has declined noticeably. They don't have any real competitors in search so they can do whatever they want.
> now it's nearly impossible for anything but the most vapid of queries.
Rather than getting us want we want, they want to gives us what they want. A narrow band of approved results. Youtube is like this as well, but then again, youtube and google are both part of Alphabet. It's like google news was a test run and they slowly exported it to search, youtube, etc.
SERP: Search engine results page. I asked ChatGPT.
"SERP stands for Search Engine Results Page. It refers to the page displayed by a search engine in response to a user's query. When a user enters a search term or keyword, the search engine generates a list of relevant web pages and presents them in the form of a SERP. The SERP typically includes a combination of organic search results, which are the regular listings based on relevance to the query, and paid advertisements, which are sponsored listings that advertisers pay for to appear prominently on the page. SERPs often contain additional elements such as featured snippets, knowledge graphs, image or video results, local map results, and other specialized features, depending on the specific search engine and query."
Google Search is in decline for the users who know how the internet works. For my mother, the internet is still Google (and would remain like that no matter what). For me, for some friends of mine, and for many of my colleagues who know more or less how things work on the web, Google Search is just in free fall: we use it as last resort, but as other search engines (or other tools, like ChatGPT) improve over time, Google Search would just disappear from our bookmarks.
I know that many of you would say that it doesn't really matter what "hackers" think about Google Search, that all that matters is that the majority of the non-tech-savvy users still think Internet == Google. Well, let's talk again in 5 years.
There was a magic period of time lasting about a decade when internet search Just Worked. If it was on the public web Google would find it. Search worked so well I took it for granted.
Today I avoid Google search at all costs, and use it mainly to find Wikipedia pages or to search Reddit.
The Googlers blame SEO and there is some truth to that; but Google has on retainer a huge stable of the world’s best-paid engineers, and still couldn’t be bothered to invest in their flagship service.
I agree, Google up until ~2013 felt magical and nowadays even news.ycombinator.com, a site frequented by many search engineers, isn't fully indexed anymore, not even 80% indexed for some obscure search terms.
It would be trivial to insert banner ads, or even "native ads" in the response.
LLMs are probably even better suited for monetisation since they have a better understanding of what people want, so a better ad can be shown that is more likely to be clicked.
Do you think people were losing their shit when ChatGPT went into bing simply because web search gets better/easier? No - people were losing their shit because it meant the ads in search are going to be turbo-charged and so that is why the share prices are surging (GOOG up 40% over last 6 months): more ad revenue from "better" ads shown to users.
People still need products to solve problems. When you ask an AI "I have X problem, what products could I use to resolve it?" that is a natural and ethical time to include advertisers into the equation. The LLM can be trained on product specifications, details, and relevant uses, as well as plug in to a review database. Companies can pay to be included in the results of possible solutions and the AI can use the available information to make specific recommendations based on real data.
Hopefully, the future will completely prune intrusive, non-consensual advertising completely and any companies that inject thoughts into our minds will fail.
For some problems. For many more you don't need a "product" at all but advertising still wants to sell you one.
> that is a natural and ethical time to include advertisers into the equation
Not at all. Ads means showing the product of whoever is paying the most or at least preferring paying products over others. Ad-free suggestions means showing the product best suited for the task. If those two match you are defrauding the advertiser by making them pay to show what you would have shown without ads. If they don't match you are degrading the service for the users.
That’s because it’s been progressively getting worse for years. For example, when they launched Google+ they stopped supporting the + operator in favour of “quotes” and people complained about that. Of course now the “quotes” don’t work either.
Google used to be amazing. If you remembered a set of words that were on a page, you could enter those words and it would find matching pages. It’s been broken for years and keeps getting more broken.
Yes it's broken, try searching 'python abs "Return the absolute value of a number"'. That string is directly lifted from the Python docs for the `abs` function. The official Python documentation does not appear for me until the 3rd page and the "featured snippet" is from some random website called flexiple.com
Well that’s the problem then. You could say your problem has more to do with the relevancy of the results than the exact match operator (I personally don’t mind in this case since for the few times I’ve had to write python I’ve found the official docs not as useful as I’d expect).
As I said, I use that operator all the time and it works. I’d be the first to complain when it doesn’t. For instance, the - operator stopped working in YouTube a while ago, unfortunately.
Firefox did most of the early damage to IE all it's own. Taking IE down from an insane 90+% marketshare position and steadily eating up 30% marketshare by the time Chrome even really showed up. And it didn't even have to stealth install itself with every adobe flash update to do it.
There were "tiny upticks" but, going slowly and steadily, in August 2008 Firefox had gained about 30-33% and IE was down to about 60%. And then (September 2008) Chrome happened.
I certainly start with google far less -- I now, very often, stray little from my bookmarks bar. The first of which is "New Chat" (ChatGPT)
I needed some somewhat obscure API information recently, and ChatGPT had it -- a testament to how much ChatGPT really is just a compression of "everything ever digitised"
Does ChatGPT “have it”? I thought it made stuff up and that the magic it does is being very good at making the stuff up.
We’ve had it come up with solutions using entirely made up functions. They had names that sounded like something Microsoft would’ve put into .Net, but they were entirely made up. As in, they had never existed in any version of .Net ever.
So as much as I like it, I’m treating it with more caution than google results. Honestly though, most of the time it’s frankly faster to just read the damn manual and figure out things for yourself. I don’t say that as some sort of anti-prompt-programming purist, but wading through GPT responses is as about as hopeless as wading through the gazillion medium, dev.to, stack overflow and whatever else people post their useless stuff on. 10 years ago if I needed to do some obscure Sharepoint programming (waaaaaay out of my field) I could realistically make something work with the help of Google, today the same thing is frankly completely impossible.
Its actually quite interesting just how good it is at making things up. When I first started using it I didn't realise it even could, so when I got a non-existent GDScript function, I started to prod it to see where it came from. It was able to explain the function, tell me when it was added (of course Godot is open source so it would in theory have access to this) and even the commit hash used to add it, and all of it sounded very plausible. It was only when I pointed out that it doesn't exist that it admitted it.
Admittedly that was on GPT3, I haven't tried GPT4 as I can't afford it at the moment. No doubt its better, but I'm not sure how much so.
It's always making things up. It's fundamentally a coincidence when it gets it right.
That's the nature of associative statistics -- and why this talk of "hallucination" is more marketing PR.
We hallucinate in that our reconstructions of our environment can be caused by internal states (eg., dreaming) -- whereas veridical perceptual states are usually caused directly by the environment.
Here, it's states (ie., the statistical averaging process over its training data) is NEVER caused veridically -- ie., its prompts are never caused by the environment.
GPT doesn't give correct answers. It gives answers that sound correct.
Those correct-sounding answers are often actually correct, but this is more coincidence than design. Anything it says is suspect, and should be fact-checked.
The more ChatGPT is just a remembering of its content, the better it is. In this case it had clearly remembered just that API (obscure educational API) -- with weird parameter key dictionaries etc. that arent in anyway some Intelligent Generalisation (oh wow! everyone fund this!!11!)
Insofar as it isn't just a regurgitation of "the better ebooks, blogs and docs", the worse it is.
This is why, when prompting ChatGPT, i'm more often aiming to have it use examples (etc.) that have a high likelyhood of exact data in its training set.
Consider eg., the prompt, "write tic-tac-toe in javascript using html and canvas" to "write duck hunt in javascript using html and canvas"
The latter is extremely hard to get out of it, with many prompts -- the former, immediate and perfect.
Why? because there's many examples of tic-tac-toe.
> Does ChatGPT “have it”? I thought it made stuff up and that the magic it does is being very good at making the stuff up
ChatGPT is just a frontend UI and refers to two different models (and different tunings of these models over time I guess).
GPT3.5 is kind of useful but it makes up stuff just enough that you can't really trust it and spend so much time verifying it's hard to say whether you saved time vs the old way. But it still produces mostly not made up stuff.
GPT-4, still not perfect, is a game changer though. It's what people generally mean when they talk about ChatGPT. There's far far less hallucination going on (not zero though). There's several ways to access it: phind, bing, ChatGPT. But I still give ChatGPT is the best.
Genuine question: if Google is a last resort, what are your more preferred alternatives? I've tried many search engines and haven't really succeeded in finding a better alternative. Wondering if I'm missing something.
Kagi (paid) and Brave search have both been good to me.
I never really liked DDG or Bing. They both suffer from the same SEO gaming that Google does.
There are also a number of smaller search engines that have been posted to HN that are kind of interesting for certain niches.
I think what will happen over the next five years is that instead of Google being the one-stop shop for search, it will be a number of smaller players + the different chatbot engines like ChatGPT and Bard and others.
I tried many alternatives, and finally settled on Kagi Search. It's a paid option but it's well worth it. They also have tools like FastGPT which I'm using more and more. It's a GPT model that has access to their search results. You can't use it for conversation like you'd do with ChatGPT, but for searching and summarizing it's amazing.
What search engines would you recommend? I'm having difficulty using google, its just practically useless these days. ChatGPT is much much better but I'd like to know other tools
I think it's actually a "u"-shaped curve. For people with no idea, Google works great. For people with vague and mostly inaccurate knowledge of how to index and search the web, it appears to suck. For people who actually know, it once again seems good.
It seems obvious to me the importance of organic results has fallen over the years. There's plenty of queries where you have to scroll down a fair amount to even see them. Pushed down in favor of ads, various widgets, content sourced from Wikipedia and published on google's urls, etc. Things that sit below the fold get less overall money, time, and attention.
With generative AI search results, soon you won't even be able to know whether your site was used for the result or not. Lots of no click queries, resulting in no traffic for the publisher
The much more worrying fact about AI is you won't even be able to know whether the information you're getting is true. I always scroll past the crap at the top to get to the actual site results.
Given the web as it is today is infested with clickbait, "native content", clout-chasing, undisclosed sponsorship, and other such pathologies, I'm not convinced that the fear of AI making truth more rare is rational.
Yeah, had this happen already. I remembered some normally purely carnivorous type of animal had a herbivore species. It was spiders, but I searched for snakes first. One of the top results was an article about how boas can be fed a diet of fruit (they cannot), which must have been AI written with how many other semi-nonsense articles that site had.
This kind of problem is also showing up on Quora. Some of the answers I've spotted are so obviously wrong that you can tell a person didn't write them.
That might have been worrying if “I” was known to reliably provide true information, but it never has, so we’re used to knowing that information probably isn’t. Adding an “A” to the equation changes nothing.
LLM’s are significantly less likely to be accurate but are quite good at fooling people. The problem is our existing BS detectors no longer work well. It’s surprisingly close to talking to a talented con man.
Was it? Go down to the local coffeeshop for the daily gossip and you'll hear all kinds of things that aren't true. People love to make stuff up and we have always needed to rely on the concept of credible sources. There is nothing to suggest those are going away.
I asked ChatGPT to give me links to some sources for one of its answers and it responded it didn't have access to the internet. I think this could be "solveable" by adding a "show your work" or "provide references" kind of feature in a future iteration.
They even describe how to mark up your paywalled stuff to help them differentiate it from cloaking.[1]
"This structured data helps Google differentiate paywalled content from the practice of cloaking, which violates spam policies."
Which is just odd to me. Why present a paywalled search result, when the market is so fragmented that the odds the user has a subscription are so small?
Feels like a no win situation for Google. Do you show results that people don't want to pay for, or do you not show any subscription needed results and people accuse you of monopoly behavior only showing sites running adsense.
Right? It doesn't have to fully elide the result even. If the top of the SERP said "3 results hidden because they are paid sites on your non-subscribed list" I would appreciate the info.
They had a simple rule that they index what users can see and if you try to cheat your way around that you get ranked down. Sticking to that for everyone would have been an iron clad defense against monopoly accusations. Instead they chose to help certain corporations (or at least corporations with certain business models) present different content to the bot and users.
Add Steam forums to those. 9 out of ten times, game walkthroughs or hints can only be found on Steam, according to whichever search engine. But only if you have a login there, of course...
Indeed, I just checked, it's not the entire Steam community site. Looks like it's a per-subcommunity (per-game?) setting whether the community section is open to the public or not. Luckily there's a lot of content that's not behind a login wall.
Cloudflare doesn't cache pages by default as that would create issues (logins, comments, admin sections, etc). They cache static files (images, css, js, etc), but that's it. You need to set it up yourself and have ways to bypass and purge the cache.
> Cloudflare doesn't cache pages by default as that would create issues (logins, comments, admin sections, etc). They cache static files (images, css, js, etc), but that's it. You need to set it up yourself and have ways to bypass and purge the cache.
Or they could just rely on Cache-Crontrol headers (and friends) instead of requiring CF-specific configuration.
I also noticed it takes way more time to index sites.
When I moved bitecode.substack.com to https://bitecode.dev, I submitted the sitemap to google and was surprised that it took almost a week to even show a single page as indexed.
This used to be a couple of hours to a coupled of days.
So something definitely changed, but as usual we can only speculate until a googler come in this thread.
Just more spam and scams than they can handle. 0.5% of all the shit on the Internet is real content. And that stat is from 2015.
Their own fault for monetizing every imaginable search query.
It has provided the incentive to ever spammer and scammer on the planet to mass produce copious amount fake interlinking pages massively larger than actual Internet.
Vernor Vinge was wrong about humanity's vile offspring being stock trading bots, and the sci-fi authors were wrong about Grey Goo being a nanobot scenario. We're awash in grey goo vile offspring right now, consuming resources at an increasing rate and turning everything bad - and it's information; SEO spam and content farms and LLM waffle.
Pretty soon the only way to live will be an Encyclopaedia, text books, and cookbooks from before 2010.
Its not a static world. Things are always changing. I have some faith Google(the ad tech part) will start crumbling. Chat GPT has given them a jolt. And there are more jolts to come. Not less. From the Regulators. From the advertisers who know most of the views they get are from bots or totally worthless. And that they don't need to be spending what they are spending.
I'm not sure this fixes anything. GPT/LLM will be able to produce monumental amounts of reasonable sounding bullshit. Google may crumble and fail, but all we'll be left with is small islands of sanctuary in an ocean of bullshit.
Google contributes to the bullshit explosion by incentivizing ppl to game the rankings and results. Same with Facebooks newsfeed. And everyone else that relies on Ads to provide their "free" services. Everyone is gaming these services cause there is money to be made if u end up at the top. Including HN. The failed assumption being whatever is in top = quality. These solutions haven't solved the Info Explosion problem. They have made it worse.
Ironically the free services like search ranking/news feed/like counting etc where the initial half thought out response to the info/content explosion that the early internet produced. They just forgot that was their goal and changed the goal post from getting a handle over info explosion to making everything about rewarding attention capture/maximising engagement and other useless shit in that contributes further info pollution.
So when google/fb etc start crumbling, incentives for the spammers and scammer will start dropping too.
This seems to be a naive point of view. Google is a giant piece of shit, but to think they are the only piece of shit is a failure of imagination at the highest order.
The billions google controlls don't disappear if google does. Instead every scammer starts looking for new markets to spread spam in. And this is just marketing, we're not even talking about politics and the firehose of falsehood.
The problem may not be solvable which is concerning as we are at risk of drowning in bullshit.
I not denying that or saying everything is going to change in an instant. I am saying we are past the point, where Google and the Attention Economy they raised, keeps running the way it did in the last 20 years. We are going to see flat revenues, more layoffs, less data centers getting built, less "free services", multiple large countries bringing out legislation on how personal data, algos can be used, more huge fines etc There is even a UN report released this year talking about how the Attention Economy should really work covering issues on the social, political, ethical and cal and economic front. On top of it the Telcos worldwide are all in huge debt. they have been trapped in a cycle building/upgrading the pipes on the belief that all the data flowing is oil. But if its all sewage then how long does the cycle run? Its going to break down.
The attention economy is the old news now... welcome to the intimacy economy brought to you by AI.
And data was the oil... it was saved in lakes and used build the electronic minds we have now created, and those electronic minds are going to work as nuclear reactors that will process the data we now create, and the data they will create as they are embodied and put into the world. It is going to break down, but not in the way we all go back to a disconnected world without surveillance capitalism, no, the future is going to be far worse than you can imagine when it comes to that.
>Telcos worldwide are all in huge debt.
This has nothing really do to with the economics of selling data, this has to do with companies taking out far larger loans than they needed and using it to enrich their shareholders.
This comment is now the top and only search result for "but as usual we can only speculate until a googler come in this thread." minutes after you posted it. I don't think freshness of the index is a criticism you can credibly use against Google.
> I have been a web designer since 2016, and before that I was a blogger for 6 plus years. I have been deeply interested in Google SEO since 2010, and in all that time I’ve never heard of Google not indexing a site.
Google crawls sites it assumes will have useful content. If it doesn't have that assumption for your site, you need to feed it.
The convention site in question had a GA tag, and it was in search console, and I had submitted a sitemap. Search Console knee the site had 9 pages in the sitemap, and had only indexed the homepage.
The other pages weren't rejected, declined or 404ed - they were simply ignored.
> If you want to be indexed, submit your pages to Google explicitly through a GA tag, a Google Search Console account, and a sitemap.
What if I don't want to put any Google-related stuff on my site?
I have a GitHub pages site with a very unique topic and it's still not indexed by Google (the site is available at least for 8 month now). I've checked Bing and my site is on the 4-th place on the results page.
So are we all now officially complaining about that it stinks where we've SEO-sh*tted for two decades?
Good, then let's get rid of this ad network dystopia which never worked as advertised (they show me special offers for a frying pan, 2 weeks after I ordered one. Laughable ya folks). Replace it with great authentic inspirational content and people will throw their 2 bucks at it.
What's the point of censoring only one character from the word "shitted"? Both you and everyone that reads your comment know exactly what you are saying, so it's not like you are shielding anyone from anything. And neither is there anyone who is going to punish you for saying a naughty word on the internet. What are you afraid is going to happen if you write "shitted"?
That is a disappointing answer. I was hoping for the truth, but got a poor attempt at humour instead.
I genuinely want to know the real answer, because this kind of behaviour confuses me. And every time I ask someone about it they don't answer, or get a non-answer like yours.
If HN blocked comments with certain words, I'd understand.
If you genuinely find shit an offensive word, and instead chose to use some other word instead, I would understand.
Too bad that the powers that be has decided that payments must be identified, so such a new internet will by necessity not be anonymous, but instead tracked and identified in every aspect.
There's a difference between me deciding which payment method I use for service X and having service X decide into which tracking networks they put me in.
Additionally, Mullvad showed how to use cash via regular post to pay 100% anonymously for service X.
Difficult to comment without URLs, there's too many potential causes
Unfortunately "[being] a web designer since 2016" is generally not helpful for SEO
But certainly setting up Google Search Console, submitting a sitemap, and 'fetching' your homepage on day 1 is good practice, especially for a brand new website
I can also vouch that what the author is saying is true. I've started publishing my notes, and I found that after three weeks none of them were indexed (and yes I submit a sitemap through google webmaster tools). I have since started using the URL inspector tool and adding them one by one. That does work.
If you're mass-producing low-quality sites you would do that on day 1 with each of them. So it likely doesn't weigh positively as much as we would hope.
Once again the black hats will come out on top. As someone in that industry, there's always a new trick or method to getting your sites indexed en mass before others. If you don't have time to find it, someone in a 2nd world country (probably India) will offer the service for relatively cheap at BlackHatWorld.
On a side note on the SERP (search engine results page) of the foreseeable future, I don't accept the doom and gloom other SEOs have. Google has shown it will give credit to the top 3 links, and to be honest pretty visible credit. This will shift the target from "front page" to "top 3" sort of like Local Seo Map Packs.
Will make difficult keywords more difficult to get clicks for, but long tail keywords could be easier to rank now as a #3 ranking could garner as many clicks as a #1 ranking.
Google is an advertising & data-collection business, not a search company. Search is their on-ramp for attracting sources of data. If your website is not worth much on either side of the business, you'll be a low-priority target.
Maybe the concept of "web search" needs to be reimagined all together?
Perhaps the idea of creating a master index of the content of multiple pages on nearly every site is the thing that is not sustainable. Instead, "search this site" needs to be made great again. Individual websites could manage their own search in a way that complies with a standard API that can be consumed by meta-search engines. Rather than indexing pages in the traditional way, meta-search engines instead use a heuristic or AI model to decide what sites are going to have the kind of information you are searching for, perform your query on theirs, and return the aggregated results to the user. The less the algorithm understands the significance or meaning of the query, the more generalized its approach can be. For instance, if it thinks that you're searching specifically for opinion-based content that will appear on blogs and forums, then it will target a federated search engine that indexes those things specifically. But if you are searching for information on making beer at home, it will know to target and weight the search engine on brewersfriend.com and homebrew.stackexchange.com. Although this sounds not that different to how search currently works today, remember that this idea is about having search become more federated and more standardized, and for meta-search to select federated indexes rather than own a god-index. A user of a meta-search can pick and choose what indexes they want available in their searches in case they find any of them to be either superior or particularly problematic, and the meta-search can optionally adjust its understanding of a particular user's queries.
The way I see it, traditional search will continue to decline in part because it's not sustainable, but also in response to AI allowing them to become "answer engines". Although a lot of people do want an answer engine, this isn't for everyone. I think there will always be a market for people looking for content on specific webpages. Whatever that thing is that someday snatches that market away from The Google, if it's going to be successful, won't survive on the current concept of what "search" is.
My guess is that Google doesn't care (not that they ever did but now less than they did before). The simple math is "Google makes no money by indexing your website." The easiest way in the past to get reliably indexed by Google was to put Google ads on your web site. Then they have skin in the game so to speak. "Long tail" websites, those that are very niche, not especially linked outside of a few folks, have only ever been there to impress people using Google into believing they index everything (which they don't, and frankly haven't for at least the last 10 years).
Over the years, as all of the ways that people who hosted a "search engine", exploited the "digital exhaust" of the people who used it, and sold that information for profit, have been revealed, well regulations and laws have been enacted to reduce or eliminate the egregious (and usually more profitable) ones. That and advertisers tired of paying for "engagement" and seeing few sales or conversions, the economics of running a search engine as this high margin, make money hand over fist, days are slowly winding down.
That Bing has (for now at least), continued to index these sites, will just accelerate Google's decline.
When I was helping to run Blekko, people would look at our curated results and be super impressed at how much better they were than Bing or Google results, but then they would search for their cousin's minecraft blog and wouldn't find it and lament that they couldn't possibly use it as their "daily driver" because if it didn't have their cousin's website in it, how could they know what they weren't seeing? Blekko tried really hard to make the argument that if you made a list of the minecraft blogs you followed as a Blekko user we'd index all of them, and if everyone did that for their favorite stuff then the index would fill up with good web sites and not be riddled with junk. But sadly we couldn't get them to internalize that and they weren't willing to create an account to have a better web experience. Perhaps we were just too early but still it is a weird thing.
I think it's important to realize that search engines both bring websites to people and people to websites. The genius part of Google's position in the market is that they both sell ads, and then direct people to the places where the ads are sold. This is also a major contributor to their search engine spam problem; they can't well penalize ads without undermining their core business model.
Although I don't think any search engine has ever indexed or will ever index everything. That doesn't really make sense. Realistically I think maybe 1% of the documents online are ever going to be a good search result for any query ever.
Internet search is all about being judicious about what you index.
>>> I think it's important to realize that search engines both bring websites to people and people to websites.
I don't disagree with this, I was simply pointing out that Google makes no money by sending people to a web site unless the owner of that web site has bought an advertisement and the person involved has clicked on that ad. Organic search, like "good explanation for approximation theory" should have something like this: https://xn--2-umb.com/22/approximation/ as one of its top results but it doesn't even make it to the first page.
The caveat is that you can only request Google up to 200 URLs to index.
It's also technically not the most 'correct' way to have your pages indexed. (supposedly used for job page updates, etc)
I don't think this is a bug. After all, why waste resources indexing pages 2 .... n when nobody even goes there? If something accrues signals outside of Google naturally, then I am sure it will get indexed. On this note, I don't want to be another 'SEO is Dead' person, because it is never dead, it just changes, but I do suspect over the longer term, links will become less relevant and Google will take a more human-curated approach to ranking selection using LLMs.
well, I can't say for everyone but I did notice that google stopped indexing a few sites I manage. No technical reason, no error, just says "Not indexed - reason: Discovered - currently not indexed" and that's most of the content of the site (nothing fancy mostly technical articles, and others notes) Unclear why this is happening.
Things are pretty bad with Google in this department. Over the last two years they have been constantly updating and changing their algorithm, there are now updates such as Core, Product Reviews, Helpful, Spam, Link Spam and God knows what else. And they are rotating them 24/7. Literally.
You could be up 20% one day and down 40% the next and you are none the wiser as to why.
If you are a small time publisher with no budget or no craftiness to attract links from other sites - you are pretty much doomed.
I have a few sites that are in the 200k monthly visits range (from Google) and Google only fetches the homepage/feed every four hours or so, sometimes it takes longer than that. It’s a lot different from what it used to be.
I have a year old WP site that showed up on Google, but wasn't indexing all pages. I submitted the .xml sitemap via
https://www.google.com/ping?sitemap="url"
But that didn't seem fully effective. I then succumbed to uploading a verification file to the root directory (requires Google account) and resubmitted an inddx request. Within 12 hours a Google search of
Site:"my url"
yielded all pages. I'm terribly rusty with websites, hence my use of WordPress and willingness to taint my root directory with Google files. I do notice that exact, relevant queries in quotes still show no results for some content. Much to relearn.
I don't remember the initial loc. It's now in the root dir as mentioned. However, I'm still not pleased with the results, but perhaps it takes time. I'm sure backlink or two wouldn't hurt.
I've done everything right according to Google, set up the search console, uploaded a site map, addressed all mobile usability issues. And yet only a tiny fraction of my content is being indexed.
I'm a bit at the end of my rope here as I've poured a year into this project and getting a historically normal amount of search traffic may be the difference between this project being viable or not. The most frustrating part here is having zero visibility into what's going on.
Your site has very little “content” and is just product listings which looks like a million other garbage sites google sees. Great for ux to have the product listings users want, but the crawlers need some plain text to read to understand what you’re site is about.
I’d add a paragraph, hero image image, cta/etc at the very top explaining what your site is. Additionally you need a menu at the top and footer at the bottom with links to additional content-only (not products) pages - IE an about us, where do we source data, etc. Even 2 blog posts would help a ton. Do not stuff them with keywords but be sure to use the words that are common in your niche of the industry so your site gets associated with industry sites.
From googles eyes the difference between a scammy online ecom site and your site is hard to see! (Even if your site provides legitimate value to users).
You can try posting a link to it on various digital
Marketing subreddits (not the like “rate my website” ones) to see if you can get more feedback - I haven’t done that in years though.
Edit - also didn’t realize clicking product links takes you to external sites. That’s a tough site for google to ever understand correctly since you have so little content and the best possible outcome of a user visiting your site is that they leave it. Maybe set it up to have each product link to your own page for it, maybe with price history from camelcamelcamel, links to the product in tiger sites, generic info about what site it’s listed on, or just crawl the description at the vendors site.
From a UX perspective, your site is perfectly fine. From an SEO perspective, though, the brand/category pages don't have enough content (in Google's opinion) to be uniquely relevant. Even though a user would find the faceting/filtering functionality highly useful, Google uses things like word count, TF-IDF and topic relevancy as signals (albeit easily gamed) to surface "relevant" pages. This is why recipe sites all have 500+ word intros before each recipe and even more on category pages.
Backlinks also matter, for both domain and page authority. You are competing with 20+year old domains from large companies-- why should you (or any new site) get ranked before dickssportinggoods.com who have top tier backlinks (graph network, implies trustworthiness) from sites like Espn.com? Google likely uses CrUX data for ranking (because 2023 backlinking is vastly different than 2012), so high engagement from users is likely a KPI to focus on, in addition to backlinks (both branded and inclusive of terms/pages you want to rank for).
There are obviously hundreds of other factors (with different weights), with dozens/hundreds of tests at any given time, but those few factors are what have remained relatively consistent over time. That is partially why Google's results are so bad. It is only a matter of time for people to figure put what matters and then optimize against it. What is best for the user may not be the best for Google, sites or advertisers. Unfortunately, many times the best content isn't visible, because people capable of marketing have a leg up versus those who just want to provide utility.
I think the issue here is that whilst your site may have utility, it has zero original/unique content. To Google, this looks like a link farm.
You might argue that your discount price is "unique content" but good luck getting Google to understand that. Plus, I imagine those shoe names are competitive queries, which means you're up against paid ads. Further, I assume your discounts are time-sensitive, which means unreliable indexing is not a good solution for up-to-date pricing.
Yup this confused me the other day - tried to search for an article I'd written for Dr. Dobbs Journal in 2001 on a very specific topic. No amount of keywords including my name, technical keywords etc. could find it.
I assumed that meant it had been deleted from the web; the publication that had disappeared 10 years ago. Then saw I'd linked to it from my own site, and the link worked just fine :facepalm:
It's weird to see Google forget the kind of content I used to go to them for in the first place.
I've seen this on a site that I built for a freelance client a couple years ago. At first they only indexed the homepage and some sporadic articles. Eventually one specific article was linked on an .edu website, and now that article is reliably indexed as well.
If I manually request that they index a page, it always succeeds and shows up on Google within a few days, but the page gets pruned from the index within a month or two.
The weird thing is that within the past couple years I've also seen unwanted indexing, e.g. low-spec staging servers getting wrecked by Google crawler traffic. I don't think the "no automatic indexing" treatment is standard for every site, so there must be something that triggers it, but I've spent a long time unsuccessfully trying to pin down the cause.
For some reason it refuses to index my site(s) until I link the site to my Google account in search console. I had organically placed a lot of links (on my GH, Linkedin profiles, no spam anywhere). It did not index even the links for over a month, despite me searching the exact domain multiple times. I guess doing it via search console just showed it the exact domains it has to crawl, making it's job easier
well, can't really comment without URL but one thing I found majorly annoying is who sticky "noindex" has become. used to be that you could put your staging on noindex, put it out there, do a lot of external testing tools (pages speed insights, mobile friendly tools, webpagetest.org, 10000 other tests that can easily be done on the open internet) and send the page around, and then on publishing day just remove the noindex, trigger recrawling and be indexed and visible in Google in no time.
now it takes weeks to month for the noindex to vanish. even after google has crawled the pages again and again as visible in the logfiles, the pages stay noindex even though the information is long gone.
The past couple weeks there have been a few times I've been completely startled that google hadn't indexed a site. One of them is a forum that google used to completely index and now I have to use the forum search itself to find old posts.
My personal guess: with ChatGPT and other locally runnable LLMs being commonplace now, Google just can't keep up anymore. There must be millions upon millions of SEO spam pages being created and updated every day.
They lost the battle, at least for now.
This also lines up with Google Search results becoming increasingly worse over the last few years, but the last months in particular.
Hmm, this is the best guess here IMO. This person did say it started occuring within the last year though. I just created a site a month ago with 2000+ pages and I did notice only 5 were indexed, which was quite annoying. I just suspected it was because I didn't have enough backlinks yet so it didn't want to waste time on it... interesting it may be what this original article is about. But yes, I 100% bet its because all new sites now have an enormous barrier with the sheer amount of fake content that can be generated.
There are an infinite number of URLs on the internet, Google obviously cannot crawl and index them all. They choose what to crawl and index based on mainly site reputation and inbound links, just like they used to 20 years ago.
If they aren't crawling or indexing your site, then link to it from a high reputation place.
I never thought they did it 'automatically'. My impression was that they always had to have found it via a link from another site, or that the site owner had to submit the site to Google. It isn't like Google is seeking out all registered domains and checking each one to see if a site suddenly appeared on it.
So if someone registers a new domain, puts a site on it, doesn't link to it from anywhere else, doesn't submit to to Google... and yet expects the site to be found by Google, that is just not a reasonable expectation.
Google has to know about the site before it can index it.
Set up the sitemap, then link the sitemap in from Google's Search Console Tools, and install Google Analytics. This will help Google pick up that your site exists.
Make sure your robots.txt file is configured to allow crawlers. Make sure your pages aren't inadvertently NOINDEX'd.
SEO isn't as relevant as it used to be, but all this stuff should be part of your QA and pre-launch checklist.
Set up uptime monitors on every page that gets more than 1% of your traffic... check page load speeds and HTTP response codes -- you never know when a WAF or some other system will get mucked up.
My website used to be hosted by Google Sites. Then I moved it to something more "real" and deleted the Google sites version. The indexer still looks at their cache of the last version that was in sites from 5 or 7 years ago.
So... the Google indexer doesn't work? SHOCKED! I AM SHOCKED TO FIND GOOGLE DOESN'T WORK!
I think search results have become worse over time for three main reasons, firstly the amount of absolute drivel that's published on the web, made doubly worse with AI written drivel. It's honestly shocking. Secondly, the web has become more centralised and people use aggregators more than individual sites or services (facebook, reddit, etc.). Thirdly, the internet has become 'safer' in a lot of ways and seemingly the search engines scrub what is probably the majority of results from ever being returned. It used to be quite easy to find a pdf of a book on some open web server from a google search alone, now it's nearly impossible. And I don't think that's because people's security hygiene has improved.
Indeed this very article doesn't come up in searches like [ddg site:natehoffelder.com] or [noindex site:natehoffelder.com]. And the article has been up since at least May 30. So yeah, looks like an outage at Google on the crawling side.
Google is part of a profit-maximizing company. Slowly, they are transitioning from showing relevant results and some ad in a separate box to results that maximize profit directly because they’re sponsored or whatever. The founders even explained it in one of their papers.
Speculation:
Crawling the web is expensive. So it makes sense that they decide for each crawled page if it's profitable to put that into the index. In the long run they might not crawl the web at all. People will just pay them to get into the index.
Google has been so weirdly different and off lately. I miss when I could type anything and get exactly what I was looking for. Those days died at least ten years ago.
Sidenote: I wish Google would lower the ranking on websites who push for a mobile app and degrade your web browsing experience. There's no point in even getting them as a search result. If I wanted something from an app, I would have just gone on the app store.
Another thing, I noticed: a weird prioritization of certain sources. E.g., I've a website that is now up for 24 years (there's even some content older than this, which has moved there), it has my name on it and my name is in the meta data, and – quite naturally – it used to be the top result, when you googled me.
Recently, I discovered that the top results are now some university pages that are not maintained, or even have never been set up. Which really undermines the purpose of the site. (Notably, the site has quite a high number of reported Google search hits, has rather perfect Lighthouse scores, and SEO ratings are high enough that I get SEO related requests – which are happily ignored – on a rather regular basis. It scores about rank 60K in the "Majestic Million". And the site enjoys a few updates/fresh content a few times a year. So this is not a case of obscurity.)
I have recently started a new web project ( https://industrydecarbonization.com/ ) where I am relatively closely following how it does on search engines, and I can't say that I share those experiences.
It took a few weeks until google noticed that it's a page with relevant content, but that's kinda expected. But once it did I feel Google is indexing my pages extremely fast, so fast that I have been wondering how they're actually doing this. I post new content on various social media sites, and my best guess is that google gets some of them as direct feeds that they check for interesting links. Google does not support indexnow, and as far as I know also no similar feature (except manually via the search console), so I'm not in any way directly submitting my content to Google.
That is why I wrote [1] for myself. It stores links in database, which I can query. Everything is later on exported, like in [2] and [3]. I can browse history, I can find useful data. I do not say it has replaced google for me. It is a nice addition that helped me gather data I encounter on the Internet.
It is a link database, at first glance resembles Reddit clone, but my focus is on creating link database, not on providing social media experience cancer.
An alternative web search architecture, is collaborative sharing the site we made or visit.
I made a self hosted spider that only index sites that I've visited and search engine with "and" "or" logics.
Thinking to make it a federated search engine by allow each user whitelist domains that are not sensitive to search with the peer. And users can follow/block other users explicitly to avoid spam.
I have a personal web site with almost 1500 pages containing blog posts, newsletters, reviews, and articles going back to the late eighties. I've been slowly consolidating all my writing on this site. For years I had kind of assume that if I build it, (eventually) it will be discoverable via organic search. I recently determined that I had to put a Google ID on it and a sitemap file. So I've found scripts, imperfect thought they are, to crawl everything and make a sitemap. Google has indexed some of it, but seems to be stalled out, with the bulk of the pages stuck in the "Discovered, currently not indexed" state, and now I'm not sure if some of the best content will _ever_ be searchable via Google. Really disappointing.
> and in all that time I’ve never heard of Google not indexing a site. […] Google used to just index every site whether you wanted them to or not
Google can only index what they know of directly, or can indirectly discover e.g., by links. I have unindexed sites because, well, they're not linked to on the public Internet.
I can pretty easily see a 2 month old website being outside that; the question is does the rest of the Internet know about it? (And, the public Internet. A shared link in a Discord channel also isn't visible to search engines…) Some of the examples contain statements that indicate Google could or should know of them, but not all of them.
I've noticed this for a side project I run. https://cfbpedia.com/ I upgraded it's server a while back and unknowingly borked it's SSL redirect. I corrected it and Google hasn't picked it up again since despite my fiddling in every way in their Console, still just "not indexed". If you search for it directly in Google you will find links to it, but never the site itself. I figured I was doing something wrong and didn't care enough to fix it, but it definitely seems like Google is making some sort of attempt at pruning.
I have a website that is #1 when it comes to the problem I'm solving. No one comes close.
Its mostly due to, its a money saving thing for consumers, and that isnt really profitable. Its low hanging fruit, and I have the best website for it. Nothing really comes close, most alternative websites make mistakes in their advice because they are using feelings rather than millions of entries of data.
Anyway, if you search specifically for my most popular metric, you will always get my website. If you google 'cheap X', you will get inferior websites.
Even with SEO optimized, there are just bigger websites that are friendlier with google, linked by other websites, or it could be better SEO. Whatever the case, it makes me wonder what kind of websites I miss because I use google.
> Users of Google search want the best results, not all results.
The ironic thing is Google is violating their earlier principles to provide "best results." IIRC, one of their big early differentiators (which they made a big deal about), was making the default query operator AND and not OR. A lot of early search engines used OR to pump up their "total hits" numbers, now Google essentially does the same thing by dropping terms from your query if the number of hits are "too low."
The answer has not been yes historically. There's always been way more content on the web than anyone can afford to actually make (interactively) searchable. The capacity of the index is a precious resource, and selecting exactly which pages to spend that resource on was always a key issue in search quality.
Can confirm. Had to manually submit all URLs to Google to get them indexed. Once indexed, everything worked fine, like it used to back in the day. Also, this has been the case for every website launched over the past 12-18 months.
In other words, this was already an issue before ChatGPT and the algo seems to be severely broken. High time for a thorough system check.
Adsense reviews are just as messy. Another department in need of a big shakeup.
I don't know what's going on there, but there's definitely lots of chaos behind the scenes.
They've been flooded with large amounts of new domains/sites hosting generated garbage content. I suspect this is just a barrier to slow it down until they figure out how to detect such spam better.
The quality of searches have gone down so much. A lot of times these days you end up in these pages which have a page for every iteration (not the right word) of a sentence, pointed to their page full of 'answers'.
Who is to blame? Google. For years they have pushed people to have slabs of text. Now I need to read through somebodies life story and how they stubbed their toe against a rock while hiking in the Sonoran foothills followed by a recipe for pulled pork tacos.
Each time I search for something, the first two, three pages, maybe even more, are just shops and affiliate link sites, that offer the thing I want information about. At least, if thing is a product. Not a single forum reference. I need to add the keyword "forum" in order to get such results. To me that is typical corporate degeneration and decline. Too big to fail, to trashy to be relevant any longer, instead, however, omnipresent.
It costs money, which is how they can be an actual search engine rather than an ad portal that pretends to be a search engine.
I have seen the future. Crap is free. Endless amounts of crap. Soon all this crap will be AI generated and designed to addict you. If you want anything that is not crap you will have to pay for it.
I'm a huge fan of self-hosted and decentralized stuff but a search engine is an area where I can't think of a way to do such a thing. The bandwidth requirements for continuous spidering and the data storage requirements are too high, and if you tried to distribute it you'd end up with an absurdly chatty protocol that couldn't be used by anyone with anything short of an unmetered full duplex fiber link.
The next best thing is a company that I pay to be a search engine for me and does that without trying to shove ads at me.
The complaint is still valid. I have subscriptions to a few sites, mostly news. I get why I need to login, otherwise how would they know that I've paid. It's still a pretty poor experience, I pay for a service, and now I need to fiddle with the settings in my browser so the site I pay for can remember my credentials.
I get that part of the issue is that I keep a weird privacy focused configuration in my browsers and delete cookies when I close the browser, still it results in poor user experience, for a service that I pay for. I don't have any good suggestion on how to fix it, but it's is valid to complain about having to sign in.
It's a poor user experience because you break it on purpose. Whatever other way they can come up with to permanently identify you, you will block that in the name of privacy and keep complaining.
I don't really care about privacy from a company that already have my credit card information, and real name and address in most cases. I care about privacy from "the others", and the paid services is collateral damage. It does appear from another comment that Kagi actually thought about it and allows a token in the search URI, that's pretty neat.
You can get an access token from settings which can be provided in the URL. Set your search engine URL to include the token, and your searches just work, even in Private / Incognito.
yeah they are one of those weird businesses that provides services in exchange for money. They say it costs them $12.50 to serve 1000 queries and the fee isnt that much more.
There used to be some sample searches on their blog but it looks like those are also paywalled. wonder if that was intentional.
- Log into webmaster tools and find out what is going on? Maybe there are issues with the page. (I see Google doesn't call it webmaster tools any more.)
- You can explicitly submit a page to be indexed.
- To use the tools, you have to prove to Google that you own that domain: follow their instructions.
Google sees “putting your site online, getting it indexed, having it appear in search results, and receive traffic” to be competitive with “pay for advertising and premium placement.”
The long term answer will be some kind of “Google Blue” where you have to pay for any placement, not just premium placement.
Google has also begun aggressively filtering domains from sending email to Gmail users. This rolled out last November but they've ramped it up this past week to the point that many "indie" domains are 100% blocked from emailing Gmail accounts.
I haven't had issues sending emails to Gmail from my personal domains... but spamhaus/sorbs blocks Gmail on the inverse because people use it to send so much spam.
My website seems to have no indexing problems. I would be shocked if I had literally any other sites linking to mine (Small personal website), so at least from my perspective the indexing behaviour seems fine.
As a random data point, I registered a new domain about a month ago and didn't do more than add a simple landing page yet, and it is properly indexed by Google.
So many people were clicking on the link from here that it inadvertently DDOS-ed my server. I had to turn on the "under attack" feature in Cloudflare just to get the other sites running again.
kicked out of Google's index for no apparent reason, only to be thrown back in just as randomly. It's like a never-ending rollercoaster ride! The GSC comes to the rescue by helping to get important pages indexed fast, but guess what? You can only submit a measly 10 URLs per day!
this makes me wonder if we will go back to the geocities model for hosting personal websites. perhaps it is already the case with .github.io and .substack.com.
it has already been hard to discover websites hosted on their own domains, unless you find them organically (such as in HN).
It's working on this idea called Brave Goggles where you can click on a goggle and it tweaks the algorithm to prioritize that type of content or specific sites get higher ranking
It's pretty cool.
All their results have been fantastic as well for me and it's supposed to be private as well.
It's pretty much replaced google for me without even noticing.
The problem is that Google actually has support for quotes, even if it's not reliable. Try searching for short song lyrics or poetry, or pretty much any obscure quote on ddg and you can't find it. Putting individual words and phrases in quotes does nothing on ddg unless they suggest it for you...in which case it often still doesn't work
How much traffic could that be? Given the number of comments on this page, I can't imagine it being more than a few thousand clicks? So why would any site go down?
I always wondered. How did we manage to run anything lets say 20 years ago. We should have multiple times more memory, cpu, bandwidth available... Or are we just that many magnitudes more inefficient?
Websites often went down under load 20 years ago too but people were far more quick to think it was their own connection on the blink.
Yes, sites were much better engineered too, with smaller and less rich and less dynamic payloads, but don't count out a change in expectations along with it.
Don't understand this at all. If you have a static site (Which a blog should be), CDNs will allow you to handle practically unlimited traffic for free.
Even without a CDN you can host the static files in a bucket for practically free.
Heck even serverless platforms usually give you 1M function calls for free each month.
It's definitely a function of expertise. You could get a free host that could totally handle the hug of death. You could know how to deploy a static website more efficiently.
attempting to reduce X to “just a function of cost” will almost always “work” - if one assumes themselves experienced enough to know how to spend hypothetical dollars.
the amount of traffic a website can handle is impacted by both. with insufficient experience, website won’t scale, money won’t be spent.
There are a bunch of tradeoffs--cost, cost predictability, control, redundancy, flexibility, etc. Money isn't a magic wand as you say and, honestly, I'm not sure how much extra I would pay in general on the off-chance that a blog post might go viral every 10 years--if that is indeed the tradeoff.
Another "bug" that seems to manifest quite often: if I search for a specific phrase or unique word on a page that I found in a SERP, so I know it's crawled that page, it often doesn't return that page either.
Add to that the automatic CAPTCHA-hellban you get if you use "site:" in anything more than a tiny amount (and the one you still get if you search "too much"), and I realise that there's increasingly huge amounts of information out there on sites that Google may have crawled before and knows about, but doesn't want to show me for some reason. I remember it used to be much easier to find information about obscure topics even if it meant wading through dozens of pages of SEO spam; now it's nearly impossible for anything but the most vapid of queries.