Google no longer automatically indexes websites?

userbinator · on June 5, 2023

On the other side, I've also noticed it appears to be aggressively pruning its index in the past few years, so the fact that it's crawled your site doesn't mean it's necessarily searchable either.

Another "bug" that seems to manifest quite often: if I search for a specific phrase or unique word on a page that I found in a SERP, so I know it's crawled that page, it often doesn't return that page either.

Add to that the automatic CAPTCHA-hellban you get if you use "site:" in anything more than a tiny amount (and the one you still get if you search "too much"), and I realise that there's increasingly huge amounts of information out there on sites that Google may have crawled before and knows about, but doesn't want to show me for some reason. I remember it used to be much easier to find information about obscure topics even if it meant wading through dozens of pages of SEO spam; now it's nearly impossible for anything but the most vapid of queries.

burnte · on June 5, 2023

Another bug I'm noticing lately is it'll flat out ignore things sometimes, even if you put a term in quotes or try to exclude it with a -leadingdash. About 30% of the time if I use those operators, they'll have no effect on the results. I don't understand why they'd make things worse on purpose, but I don't know how it could be just a "mistake" no one noticed.

chongli · on June 5, 2023

Search engines in general have realized that it's more profitable to show you irrelevant results than to show you nothing. Furthermore, they've realized it's more profitable to show you irrelevant results laden with their ads than show you highly relevant results from ad-free sites.

Perverse incentives at work!

20after4 · on June 5, 2023

This is precisely what happened. When google merged with doubleclick.net the new company should have been named doubleclick.net and not google. The old google ceased to exist at that point and was swallowed by an advertising company.

I strongly agree with this bill hicks bit on advertising:

https://www.youtube.com/watch?v=-gd01vfKfr0

romusha · on June 6, 2023

Is there any good search engine these days?

tomcatfish · on June 6, 2023

graevy · on June 6, 2023

Not really. LLMs are good if you account for hallucination.

supriyo-biswas · on June 6, 2023

phind.com

Kagi (don’t use it personally)

ceph_ · on June 5, 2023

[flagged]

20after4 · on June 5, 2023

I was being hyperbolic but it's not that far from the reality of the situation. Google's decline started around the same time as that merger.

I'm not the only one who thinks this way: https://www.nytimes.com/2020/09/21/technology/google-doublec...

cduzz · on June 5, 2023

See also Boeing merging McDonald Douglas[1] [1]https://qz.com/1776080/how-the-mcdonnell-douglas-boeing-merg...

hinkley · on June 5, 2023

I know there were rumblings in the late 00's and early 10's about how McDonnell Douglas culture and executives were ruining Boeing.

But some people take a step farther back than this and blame Congress for the 737 MAX. They basically forced the merger, and unhappy weddings make for unhappy homes.

cduzz · on June 6, 2023

I've seen plenty of mergers where there's a weird brain transplant and flippozambo! the acquired company's leadership is now in charge of the buying company. The fish, as they say, trots from the head.

stcroixx · on June 6, 2023

I totally agree with you. Ads have influenced everything they’ve done since. It’s like a brilliant, talented individual who has been addicted to heroin for a decade.

eyegor · on June 6, 2023

Yeah but google also became wildly successful to the point that they blow money on ventures with no real businesses plan and give up two years later when they can't turn a profit. They effectively have a blank check at all times. They're more like a businessman who's addicted to making money at the expense of all their personal relationships.

formerly_proven · on June 5, 2023

Regarding unhinged ideas, doubleclick is quite old, but is it old enough that opening a hyperlink would've typically required a double click at the time? Or is the metaphor here that their ads are so amazing people are double-clicking them in ecstasy?

Tagbert · on June 5, 2023

Double-click, as others have said was never something you did with hyperlinks, even before the web.

Double-clicks were used with icons on the desktop because you could do more with an icon than just open it. You could move it, copy it, etc. Double-click was a convention for a shortcut to open the reference of the icon. A single-click would have not allowed those other actions.

Terretta · on June 5, 2023

Because of this, double-click became business speak for going to the next level of detail, digging into, etc.

The idea behind this name for ads was: this company makes ads relevant and compelling, so users drill into them and find whatever you want to advertise.

For what it's worth, because of the affordance you mention, even though users didn't have to, they consistently double-clicked banner ads, and most things they wanted to activate, even after they learned they only had to single click the blue underlined things.

subpixel · on June 6, 2023

Your use of the past-tense is premature, I see double-clicking all the time at work.

tempest_ · on June 6, 2023

Have we moved so far past the desktop that this knowledge is starting to be lost?

I had to swap real quick to my desktop and see if I still had to double click an icon just to be sure.

I mostly access things with the super key / search now and I guess people with phones would just tap an icon.

emodendroket · on June 6, 2023

You can configure single clicking in Windows, or at least you once could.

account42 · on June 6, 2023

You can but that means hovering over files for a second will select them (and clear your previous selection). Other systems (e.g. KDE) manage single click to open without that annoyance.

account42 · on June 6, 2023

> A single-click would have not allowed those other actions.

Yes it would. And did. Windows chose double click to open but other systems managed with a single click while still allowing you to drag around icons and files.

foobiekr · on June 5, 2023

Opening a hyperlink has never required a double click in browsers. Not from Mosaic forward.

gramie · on June 5, 2023

I find Amazon really irritating for that. I do a search for a very specific thing, and a ton of results always come back, often having nothing to do with my search request. And sponsored results both at the top and scattered through the results.

cptskippy · on June 5, 2023

Amazon is gotten so bad that unless I know an exact part number or model then I don't bother. I'll go somewhere else for any research and only come back to Amazon if I want to price shop what I found.

hex4def6 · on June 5, 2023

Even with an exact part number, it will often push related items first. I was searching for a specific thermal printer, literally using the PN (something like C18647585), and it still decided to show me "sponsored" and related thermal printers first. So it somehow knew that part number as a keyword for thermal printers, but just didn't want to show me the one result that actually would be helpful (it was a third party seller, so maybe that penalizes the result?)

user_666 · on June 5, 2023

Amazon is so bad that I shop on Walmart's website now.

xoxxala · on June 5, 2023

I get better results by searching Amazon via DDG, Brave or Kagi. Amazon's search, especially for books, is nearly useless by comparison.

tedmiston · on June 5, 2023

Luckily for us we have the high-quality, independent book data provider Goodreads! /s

dredmorbius · on June 8, 2023

There's Worldcat, though its site revise last year made it useless for me.

I'm finding Open Library (part of the Internet Archive) is increasingly useful for book search.

You might also have success with a major library (e.g., British Library, Library of Congress, major US city libraries (NYC, Boston, LA, San Francisco, Chicago, etc.), and some academic libraries. Watch that these aren't in fact backed by Worldcat though. (Many local library systems are.)

winrid · on June 6, 2023

I found goodreads search to be quite good, where's it lacking for you?

tedmiston · on June 8, 2023

Goodreads is tolerable, but mainly as a data source. The product itself has been in maintenance mode for... a decade?... or basically since the Amazon acquisition.

They rolled out a completely new design semi-recently for... half of the pages... and left the other half on the 10+ year old styles.

It just feels like Amazon is happy to take advantage of its dominant position with Goodreads having a more complete catalog than any of the other more open offerings. And yet, they seem to invest no effort in modernizing or improving the site, making it more performant, etc. The moderation tools kinda suck too — doing super common things like merging (incorrect) duplicate listings is a PITA.

Also the app exhibits blatant conflicts of interest like prioritizing buying new books from Amazon over, e.g., digital library loans, with no option for users to configure that.

Nothing I'm saying here is new though - https://en.wikipedia.org/wiki/Goodreads#Criticism.

efreak · on June 6, 2023

Amazon owns Goodreads. It's not independent. It's also not mentioned on the site afaik. (They also own IMDb and a bunch of other internet companies that aren't Amazon-branded). If you want something independent, try storygraph or librarything.

winrid · on June 6, 2023

Yeah I knew about IMDB. I know a guy at Amazon that said IMDB used to actually be in Perl, they rewrote it in Java over a few years.

kybernetikos · on June 5, 2023

This is particularly bad if you search for a type of thing, e.g. "mechanical keyboard". Many of its top suggestions will be for nonmechanical keyboards and that won't be obvious without reading their descriptions carefully.

ajsnigrutin · on June 5, 2023

Aliexpress is the worst for this...

"keyboard non mechanical like cherry switch membrane touch rgb light clicky gaming blogging ergonomic xiaomi redmi arduino android ios windows laptop desktop computer tablet phone", and if you sort by price, the first one is $0.99.

Then you have 5 different colors of plasticky $15 keyboards and a usb card reader for $0.99 to choose.

0cf8612b2e1e · on June 5, 2023

“Mechanical keyboard like”

What’s infuriating is how this lying has become normalized in “good” brands. For instance, try to buy a 60” TV. I do not think you can find one. They are all 59.5” and sold as ‘60” class’.

CamperBob2 · on June 5, 2023

That usually means that discriminatory taxes or regulations are being dodged, for better or worse.

account42 · on June 6, 2023

Perhaps that's one reason but it's also a >1% reduction in display area which is not nothing.

quickthrowman · on June 5, 2023

Stop buying from Amazon. I haven’t bought anything from them in years. There is nothing that Amazon offers for sale that you can’t find somewhere else, aside from maybe entertainment content that they produce.

Don’t reward bad behavior or they’ll keep doing it.

account42 · on June 6, 2023

What you can find on amazon that you can't find elsewhere (at least here):

- All your regular non-food needs in one cart.

- No-hassle refunds if there is a problem with the order.

The first one is just convenience and I could do without. The second one is where most other stores fail. Or at least enough of them that I don't want to risk having to phone their hotline or pay for shipping back because they or their delivery contractor fucked up - or is fraudulently claiming they tried to deliver when they made no such attempt at the specified address and instead want you to waste your time picking up the package at a random location accross town.

ballenf · on June 5, 2023

Also on the HN front page right now is an example of the price we pay when we prune "undesirable" websites from our search indices:

https://www.quantamagazine.org/sci-fi-writer-greg-egan-and-a...

I don't know if 4chan is included in the google index, but I've never gotten a 4chan result in any search I can recall.

tayo42 · on June 5, 2023

why would you expect to get a 4chan page? none of that data is persistent? iirc google relies on links to the page, so that is impossible, plus the content rotates constantly when they drop off the last page

ipaddr · on June 5, 2023

Because you get 4chan results in duckduckgo and yandex.

tayo42 · on June 5, 2023

What's the query and result?

I still don't get how you expect it to work when the content rotates quickly and disappears

If you search 4 Chan on Google it does come up in the results and a safe search warning

chownie · on June 5, 2023

Only a couple boards on 4chan update quickly, the vast majority contain threads which stand for months at a time.

jstarfish · on June 5, 2023

Funny you say that. I got referred to the local AI models thread (/lmg/) on the technology board just the other day.

dopidopHN · on June 5, 2023

The invisible hand of the market at work. Nothing perverse about it. It generating more money and that the only metric that matter.

Nothing to see here.

20after4 · on June 5, 2023

Generating money for google is not the only metric that matters for the users. The incentives are perverse from the perspective of everyone other than google executives and investors with significant google holdings.

ajmurmann · on June 5, 2023

And we are seeing alternatives like Kagi pop up because of it.

CamperBob2 · on June 5, 2023

Let's ask Infoseek about that.

on June 5, 2023

[dead]

gpvos · on June 5, 2023

Your use of Tinyurl doesn't really shorten the URL. Please don't obfuscate unnecessarily.

tnffhjiuddsg · on June 5, 2023

Ideally, I'd like to delete messages after a day, a month, a year. But HN messages stay online forever and search engines eventually pick up on them

I just don't like being in the panopticon

anticrymactic · on June 6, 2023

Link is https://www.aisearch.vip/aisearch#

There are enough dead "short" links on forums already.

throwawaymobule · on June 5, 2023

that short URL goes to aisearch.vip

cmeacham98 · on June 5, 2023

I notice this happening when the actual query would have returned 0 results, Google ""helpfully"" will modify your query (such as dropping quotes) to generate more results.

This is super annoying because it doesn't appear to inform you of this anywhere in the UI, until you click through to page 2 and see what it modified your query to be.

QuantumGood · on June 5, 2023

Over the years, I've found the frequency of "0 result" queries has gone way up. Subjective anecdata from me, but it's a pretty big difference. There must be some large areas of their index that have been dropped over time.

From what Google's hinted at and probably your own experiences (which I reckon are like mine), it's pretty clear that most folks aren't great at Google searches. This might be why Google has leaned on AI to "guess" the best results. They figure their AI can predict what you want better than you are able to specify via your search query.

voidhorse · on June 6, 2023

Which has the really annoying side effect that now you actually need to produce a worse query to get decent results.

A few years back I used to scoff at people who wrote searches in the form of literal questions instead of boiling things down to key terms, e.g. the "how do i..., what is the largest..." type searches.

Nowadays I not only need to write stupid searches like this to get better results but quite literally find that my brain has adapted the pattern and my past skill at crafting salient key term, operator driven queries is eroding.

morkalork · on June 5, 2023

Also 1 page results. Around 10-20 links. Like, that's it? Really? No more? That's not what I see when I try the same query on Bing!

lispy4 · on June 5, 2023

I've noticed the same. And sometimes you'll get search results showing 10+ pages, but if you actually follow through them, the results die by the second page. Google also omits many domains from search results now.

winternett · on June 5, 2023

Sites like Twitter and Instagram also frequently completely change the search term now to something else for certain queries. This practice is anti-competitive in the highest order. The very foundation of having a text search is to have an exact query match to begin with... The alternate spelling item should only be a suggestion in results at the most, but they've flipped this now, and that's outright deceptive.

jerf · on June 5, 2023

It is located under the search query itself. For instance, see https://www.google.com/search?hl=en&q=%22spill%20clean%20big..., a nonsense query I made up with no hits. You should see

    No results found for "spill clean big search stain".

    Results for spill clean big search stain (without quotes):

on the page.

This isn't new, I've always seen Google doing something like this. It hasn't always been large on the page but it's always been there.

cmeacham98 · on June 6, 2023

Might be desktop exclusive? For me on mobile (testing another random phrase as now yours hits this post), I don't see that text, or any other indication the query has been modified.

burnte · on June 6, 2023

I'm used to that. But this is worse, this is when the search I enter would give the ideal results because I eventually contort the search parameters that I find some results.

_sojh · on June 5, 2023

Funny that people call these "bugs" as if anything related to google search happens on accident.

They don't need to waste the eng resources or infrastructure on rock solid search anymore, they own the market and got all the users into their funnel of products, most locked in for life.

Search results still show sponsored listings, they still have all the users, and all the profit, and a lot less of the profit sucking operational costs it took to be good at what made them a household name, search

passion__desire · on June 5, 2023

Is google trading "accuracy" for computation cost at the same time inserting junk results into the results?

_sojh · on June 11, 2023

They aren’t inserting junk, they just don’t do anything to rank quality results above the junk anymore. The junk was always there, on page 2 and beyond, and who would ever need to hit those pages. Nowadays I’ll be 15 pages deep and so far off base of the search term that I could write a better index with curl and regex.

The issue isn’t that they are watering down the cream in the milk, it’s that the cream isn’t part of the milk at all now.

wolpoli · on June 6, 2023

This is a good point. Google got to the top by having the best search result back when it mattered. It now no longer matters.

freitzkriesler2 · on June 5, 2023

There was a Google search engineer on Reddit who claimed the opposite personally, Google is going down the trash but the alternatives aren't any better. Of course I can find it now, thanks Google.

I wish there was a search engine that ran like mid 2000s google but with a social media component so you can down vote SEO spammer blogs into oblivion.

lopis · on June 5, 2023

Unfortunately, content farms can push new websites and blogs faster than you could ever downvote them. LLM are going to make that task increasingly easier. I've no idea how we're ever going to be able to search anything anymore using classic search engines. We either go back to website directories, or forward to AI generated content..

RajT88 · on June 5, 2023

Web rings.

Think about it - some human element of trust and vouching for someone being added to the ring.

ddalex · on June 5, 2023

Till you find out that most human will sell out ring links for a bit of cash with no problems.

mostlysimilar · on June 5, 2023

Perhaps that is one additional layer of friction that will make human moderation / social voting feasible. The fire hose of AI trash content will come too rapidly for it to work at layer 1 (all content), but if the barrier to entry is a financial transaction to take over placement in a human-curated webring or directory it becomes easier to moderate / vote away the trash.

RajT88 · on June 5, 2023

Sure. It's a problem with peer-reviewed science journals even. There are no perfect solutions to monied interests bribing the curators.

groby_b · on June 5, 2023

Add a trust metric and chains of provenance. Bad ring link -> bad trust percolating up that chain. Little trust, your site isn't always shown as part of the ring. Too much loss of trust, you're out.

(Ultimately, this is a bad facsimile of human group behavior - all the way up to shunning people who deeply violate group norms. And I don't think it'll scale super-well. )

gregoryl · on June 5, 2023

That's pagerank, right? The trust was built from href votes.

groby_b · on June 5, 2023

Except there's no provenance or root of trust. There is (IIUC) no back-propagation of a penalty if sites violate trust, just an overall observational measure.

And I'd still say pagerank did work really well in an Internet where there was overwhelmingly trust. But in a world where default-trust is a bad stance, I believe there needs to be an equivalent of what "You can trust X" does in small in-person groups. (Or, alternatively "Sheesh, X went off and just destroyed all trust")

I do think it'll need to be more than a single metric, too. Trust is multidimensional by topic(E.g. "I trust the NYTs data science folks, I have zero trust for the OpEds"), and it is somewhat personal. (E.g. I might have experienced X lying to me, while they've been 100% honest to you - maybe in/outgroup, maybe political alignment, maybe differing beliefs, etc. Ultimately, what we call trust in an indirect situation is "most of my directly trusted folk vouch for that person)

nunuvit · on June 5, 2023

Keyservers. You decide which keyservers to register with and to trust for verifying others. Browsers would handle en-decryption automatically and allow you to flag, filter, or finger (in the Unix sense).

CamperBob2 · on June 5, 2023

No, they can't. Or at least, they don't. I see the same trash-fire sites on Google all the time. Google just DGAF.

throwaway894345 · on June 5, 2023

I used DDG for a while, but DDG's quality fell precipitously a few years ago (similar issues where it ignores quotes and won't find pages even if you search for the title string exactly, etc) and I eventually came back to Google which has also been increasingly frustrating.

> I wish there was a search engine that ran like mid 2000s google but with a social media component so you can down vote SEO spammer blogs into oblivion.

There's no way this won't get abused, but the SEO stuff is out of control. Not even spammer blogs, but if you have a quick question like "how do I check tire pressure" you will only get articles that start with a treatise on the entire history of car tires and the answer is deeply buried somewhere in the article. My guess is that Google sees that we're on the page for a longer time than we would spend on pages that just return the answer, and they assume that "more time on page" == "better content" or something.

AshamedCaptain · on June 5, 2023

DDG has become ridiculous. They seem to be merging "local", geoIP based results no matter what country I select on the region list (or I disable it). Very often completely unrelated stuff (but local) appears on the 5th or 6th result, midway the first page.

Most egregiously I will search for something very rare (e.g. about programming) and DDG will return me results regarding my city's tourist/visitor info. It's as if it just keeps ignoring words from the search prompt that return no results until it runs out of keywords then it's just the geoIP results.

account42 · on June 6, 2023

I hate this forced localization so much and its everywhere. The internet used to be a place where you would actually encounter stuff outside your locale.

pixodaros · on June 5, 2023

That is because DuckDuckGo started relying almost entirely on Bing for their regular search results after first Yahoo gave up maintaining its own index then Yandex became part of a natio non grata leaving them to choose between partnering with Bing and partnering with Google or creating their own index https://help.duckduckgo.com/duckduckgo-help-pages/results/so...

red_admiral · on June 5, 2023

The tire pressure query is exactly the kind of thing that AI should be able to handle easily, though. At which point google has an incentive to sort their competitiveness out.

dcow · on June 5, 2023

Kagi is an alternative and it is worlds better. Try it out!

https://kagi.com

dhc02 · on June 5, 2023

Love kagi. The first time I got the "your payment was successful" notification I felt like I'd never get that much value out of it. But now, a few months later, I feel like I could never go back.

anderber · on June 5, 2023

Don't they use Google for their results? I'd imagine they'd run into the same issues the article is pointing out.

dcow · on June 6, 2023

No? At least not like you are implying. Kagi queries multiple data sources and synthesizes results. This means Google’s failure to index does not impact Kagi in the same way as it would DDG (with Bing).

rolisz · on June 5, 2023

I'm also a huge fan of Kagi. I've been a paying user since they launched the paying subscription. Really happy with it!

mordae · on June 5, 2023

Though a subscriber myself, Kagi doesn't really add results, does it? It merely weeds out the trash for you. So you can get to the bottom of search results.

anonred · on June 5, 2023

Just being able to block spammy Stack Overflow clones from ever appearing in the results is worth the price of admission for me.

saltysalt · on June 5, 2023

Here is another, just launched: https://greppr.org/

Minor49er · on June 5, 2023

Looks promising, though I noticed that it doesn't encode queries properly when searching. For example, if you go to the homepage and search for "../robots.txt", you'll be redirected to the site's own robots.txt file

saltysalt · on June 5, 2023

Thank you kindly for testing, I'll need to fix that one.

freitzkriesler2 · on June 5, 2023

Checking this out, thanks mate.

fallingknife · on June 5, 2023

What I want is a "serious mode" that makes it favor primary sources, peer reviewed papers, and raw data. When I search for economic data, I don't want a million news articles referencing pieces of it. I want the raw data release. When I search for some video going viral, I don't want a million videos of journalists talking and showing clips. I want the full raw video.

joeythedolphin · on June 7, 2023

Beautifully said! As a thinker of philosophy, I have come to understand that our clip-society is based by design. People can express power over others if they tell you a construction and then show a clip to support it. They really don't want you to see the source/what it is/the truth. They want you want you see what they show you. This problem is accelerating in western societies and it is a fundamental problem of human nature. Journalism is the healthy expression and what we see in today's media is the sickly end.

melagonster · on June 6, 2023

Google scholar?

marginalia_nu · on June 5, 2023

> I wish there was a search engine that ran like mid 2000s google but with a social media component so you can down vote SEO spammer blogs into oblivion.

This is sort of what I've been trying to do with Marginalia Search, except I don't really believe a voting system would work. It's far too easy to manipulate. Been playing with the thought of having something like an adblock-list style system where domain shitlists can be collaborated on and shared without being authoritative for the entire search engine.

My search engine is still pretty rough around the edges and limited but I think it works well enough to demonstrate the idea has some merit.

Tommstein · on June 5, 2023

> Been playing with the thought of having something like an adblock-list style system where domain shitlists can be collaborated on and shared without being authoritative for the entire search engine.

Even just personal shitlists would be golden and make just about everyone happy.

hnick · on June 6, 2023

Something I've wanted (which probably exists as an extension in Chrome?) for Google searches is a simple blacklist. Just a little button and confirmation next to a result, telling it to never show this blog-spam-ad-laden-SEO-mess of a page to me ever again. Maybe it's an uphill battle, but for some niche topics (like certain games) there are some sites I keep having to scroll past and sometimes accidentally click that are written in SEO-speak and say a lot without saying anything at all.

eadler · on June 7, 2023

Perhaps https://addons.mozilla.org/en-US/firefox/addon/ublacklist/ might work?

freitzkriesler2 · on June 6, 2023

Will check it out, thanks bud.

winternett · on June 5, 2023

I miss AltaVista so much... It was no frills and only based on page content.

BryantD · on June 5, 2023

Those of us who worked there thank you!

lispy4 · on June 5, 2023

Loved everything about Alta Vista, including the logo, and the UI.

I miss 90s Internet in general. It wasn't the ugly battleground and desolation planet that the current net has become.

fragmede · on June 5, 2023

Remember Guestbooks? you'd visit a website, and volunteer your name and which country you were from and leave comments. And it wouldn't be a cesspool of spam and porn and XSS attacks? How quaint!

lispy4 · on June 5, 2023

Oh gosh, yes! And reading the guestbook was always so fun. An elderly friend of mine passed away in 2018, and in doing a (google) search of him, I found guestbooks he'd signed 20 years ago.

mech422 · on June 5, 2023

I loved me the 'near' keyword - thanks!

masukomi · on June 5, 2023

I both appreciated Alta Vista, and appreciated its office space in Littleton ( i think ) when i worked in it after its passing. ;)

zorked · on June 5, 2023

You would look for a thing and the first five pages were random mailing list discussion archives discussing how the thing was 5 years before... Altavista was impressive, but there is a reason why it went away.

mostlysimilar · on June 5, 2023

> I wish there was a search engine that ran like mid 2000s google but with a social media component so you can down vote SEO spammer blogs into oblivion.

I want this too, but I think an often understated aspect of this issue is that by this point Google has absolutely trashed the web of that era. In these threads people will say "the content you want isn't out there, it's all on social media now" -- and they're largely right, but I think Google is the party most responsible for mutilating the web to the state it is in now, and users fled to social media partly because it seemed like a safe haven.

What we need is a concentrated effort to rebuild the web. Take the best parts of what we've learned and combine with the best parts of what we've left behind and try to build something better, for humans, not for advertisers and hyper-capitalists.

That will take time, energy, and people who remember what we lost and believe we can build something better. A better search engine alone is not enough.

Falkon1313 · on June 6, 2023

Largely right, but actually a lot of that stuff is still out there. The personal and hobby pages, forums, blogs, etc.

Google just doesn't know that they exist anymore, or rather doesn't want us to know, because those sites are not commercial enough or big enough.

Almost without fail, no matter what you search for, it tries its best to turn it into a search for a product or service. And those content oriented websites don't fit that, so it just pretends they don't exist.

yomlica8 · on June 5, 2023

It seems like google hardly returns results from traditional forums or blogs which has probably accelerated their decline artificially.

JustLurking2022 · on June 5, 2023

The web changed when every kinda slimy business bro realized they could monetize gaming search results. No matter what your fantasy web looks like, be assured, people will game it to the point it's not what you intended.

mostlysimilar · on June 5, 2023

If I take that viewpoint on everything I might as well live as a recluse in the woods and avoid people altogether. I have to believe that there are enough of us are out there that genuinely want to build better things for people.

noizejoy · on June 5, 2023

The web, just like the real world isn’t static. Becoming and staying intellectually, emotionally and physically mobile may be the only long term strategy to avoid ending up in one or the other dystopia, sooner or later.

When rates of change were slower, you might only have to “move” once in your life, but with increasing rates of change in our human experience, staying nimble is arguably of ever increasing importance.

joeythedolphin · on June 7, 2023

Yes. This is like water -- keep it moving, find fresh streams.

JustLurking2022 · on June 5, 2023

And my point is that there are probably a lot of those motivated people working on the problem today. You make it out as though we've arrived at this state by either lack of effort or competence by Google/Microsoft. My guess is that every time they change the algorithm, the spammers adapt too. That's inevitable and would be just as much of a challenge for your supposed utopia. If you have some secret they don't, there's certainly plenty of money to be made.

13of40 · on June 5, 2023

I think google does OK with the syntax it still supports for text queries, but if you switch to the images tab it just thows all of that stuff out the window. I would love to be able to search for "cat eating watermelon" or whatever and only get results with cats eating watermelon, ordered by the proximity of that text to the image returned. Hopefully AI is going to do something for that, but the state of the art, as embodied by the biggest player (Google) is shamefully deficient.

mech422 · on June 5, 2023

I've noticed this with the quotes as well...

flyinghamster · on June 5, 2023

I'm noticing this with DDG as well. :( I guess the powers that be have decided that information must be hidden.

pixodaros · on June 5, 2023

Its even stupider than that. There are only two major, publicly available web indexes in the USA, Google's and Bing's. After 24 February 2022, DuckDuckGo ended their partnership with Yandex, and since then they say "we have more traditional links and images in our search results too, which we largely source from Bing" https://help.duckduckgo.com/duckduckgo-help-pages/results/so...

HWR_14 · on June 5, 2023

The web indexes from Google and Bing are available publicly? I can pull it down from somewhere and try to make a search engine?

pixodaros · on June 5, 2023

Bing at least license their indexes to partners on a commercial basis, as did Yahoo until they gave up indexing the web. I am sure that the NSA, the Chinese government, the ahrefs website, and other organizations have comprehensive indexes of the web which they don't share in this way.

Mojeek seems to be the independent, non-paywalled search engine with the biggest index, for an overview see https://seirdy.one/posts/2021/03/10/search-engines-with-own-...

Der_Einzige · on June 5, 2023

Be careful! The Google search guys will come on HN and gaslight you about this, claiming that the advanced search functionality works perfectly and it's simply user error.

We know it's not, but expect them to try to tell you you're imagining things.

akvadrako · on June 5, 2023

Those operators are no longer supported. You can use Verbatim mode, which is more like the old behavior.

burnte · on June 8, 2023

What? Are you serious?

bbarnett · on June 5, 2023

Use verbatim mode, under tools, after your initial search. They broke it, but it still helps.

sph · on June 5, 2023

> On the other side, I've also noticed it appears to be aggressively pruning its index in the past few years, so the fact that it's crawled your site doesn't mean it's necessarily searchable either.

I've noticed this as well. I have a crappy website for my app I need to do better marketing for (not my priority just now), but I've noticed that, for however crap it is, I have received ZERO incoming hits from Google, apart from a couple people that have literally just googled my domain name.

I do not believe for a second there's not a single query done in the 2 months the page has been up, globally, for which my website wasn't a bit relevant. Either that, or the spam problem Google has is much bigger than anyone thinks.

Yet another data point in favour of the Dead Internet theory.

gilleain · on June 5, 2023

You could try the google search console - it gives you a view on what hits/clicks have come in over time.

edit: Hah. I notice it suggests using it at the top of the page if you use 'site:..." - and I only get 5 results for my site when the console claims to have indexed 10 times that many!

edit2: Also duckduckgo returns more like 15 hits ...

soared · on June 5, 2023

Silly to see people complaining about search results and indexing without backing those claims with data from search console. It’s like devs turn off their brains when it comes to marketing because they don’t like it.

sph · on June 5, 2023

Google's bloody Search Console says I got 16 impressions in 2 months for literal searches of my domain name, and nothing else. Funny seeing people thinking I got those figures by reading tea leaves.

Who's the silly one now?

PaulHoule · on June 5, 2023

I have all sorts of things that I wrote years ago and I can never find them searching by title unless I put the specific name of the site in the query. I sure can find the slideshare though where some guy from Oracle stole not only my title but much of the content from my blog.

rideontime · on June 5, 2023

Dead Internet theory?

b800h · on June 5, 2023

From Wikipedia:

"The dead Internet theory is an online conspiracy theory that asserts that the Internet now consists almost entirely of bot activity and automatically generated content, marginalizing human activity. The date given for this "death" is generally around 2016 or 2017."

Not sure I'd call it a conspiracy theory.

gspencley · on June 5, 2023

> Not sure I'd call it a conspiracy theory.

What I find funny about that framing is that, regardless of whether or not the theory has merit, a conspiracy theory by definition asserts that there exists two or more people conspiring with the intent to produce the alleged outcome. From what I understand, dead Internet theory alleges no such collusion or intent. I could be wrong but I believe that it merely suggests that the amount of bot-generated activity has come to dwarf human generated content to the point where the Internet is effectively "dead" from the perspective of its original purpose: humans sharing human knowledge.

PaulHoule · on June 5, 2023

10 or so years ago I wound up blocking everyone other than Google in my robots.txt because I was sick and tired of webcrawlers from China crawling my site twice a day and never sending me a single referrer. Same with Bing. Back when I was involved with SEO the joke was you could rank #1 for Viagra on Bing and get three hits a month.

tankerkiller · on June 5, 2023

At least so far according to Cloudflare bots consist of around 1/4 of all internet traffic. But that could be pretty far off depending on how they get those estimates.

rchaud · on June 5, 2023

This very link had a Cloudflare "prove you're human" screen that prevented me from reading it.

esperent · on June 5, 2023

The figure I saw most recently was 42%. Weirdly my brain can remember the number but not where I saw it.

But what I'm curious about, whichever number is true, is whether people mean "malicious bots" when they say this, or just any kind of autonomous agent. And also whether they are counting volume of data or simply network requests.

Because if by "bot" they just mean "autonomous agent making a network request" then honestly I'm surprised the number isn't higher, and I don't think there's anything wrong with it. Every search crawler, every service detector, all the financial bots, every smart device (which is now every device) and a thousand other more or less legitimate uses.

el_snark · on June 5, 2023

I've got a script for parsing my web logs which removes all the lines which match persistent indexers/bots/scrapers and any obvious automatons. Logs generally shrink to 40-50% of their volume, so I'd at least double CF's estimate.

pixl97 · on June 5, 2023

https://www.youtube.com/watch?v=kL8rHf_idt0

Thoughty2: The internet has died

In this video they rename it from 'theory' to 'prophecy'. As in the internet isn't quite dead yet, but its rot filled bloated body is near its dying breath.

mattgreenrocks · on June 5, 2023

Same issue for me. Message is clear: be relevant to be indexed. And everything indexed is relevant.

wahnfrieden · on June 5, 2023

use intext:

donkeybeer · on June 5, 2023

What I fucking hate is writing a query, sometimes even with parts in double quotes to clarify, and google "helpfully" correcting it to something unwanted, and then putting up the damn captcha when I click the link to search exactly what I want.

roncesvalles · on June 5, 2023

Another thing I've noticed: Google only indexes what people search. Meaning, sometimes if you search for something obscure and you don't get good results, come back a week later and you'll get much better results because your query is now a part of their indexed search terms.

masswerk · on June 5, 2023

This, I have noticed some years ago. It seems much like, if the number of returned results doesn't meet a given threshold, some kind of optimizer runs over night on these searches in order to provide a more extensive result set.

kccqzy · on June 5, 2023

Super interesting discovery! I wonder if whatever algorithm Google is using has reached its scalability limit on today's Internet, and it takes some kind of an over-night batch job to do obscure searches usefully. Maybe all Google Search is doing is just a giant cache of slow search results.

masswerk · on June 5, 2023

This has been some years ago. Notably, I observed this in relation to search suggestions. You could enter a search and get zero results, but a day later or two, you'd get at least a suggested search term (regardless how accurate or meaningless this may have been). So I guessed, these were built up, at least partly, retroactively. With results now happily including these "sympathetically adjusted search terms" without presenting this as an explicit option, I'd guess, this may now apply automatically.

0xcde4c3db · on June 5, 2023

> Add to that the automatic CAPTCHA-hellban you get if you use "site:" in anything more than a tiny amount

Pretty much any advanced operators seem to do it for me, notably "intitle:" and "inurl:". I'd wager that there are a lot of automated searches using these to look for exposed admin interfaces, but I find them extremely useful for filtering out the crap that clogs up results when a ton of news sites all regurgitate the same viral press release or wire article.

bhartzer · on June 5, 2023

Just fyi, the database that is used for the site:domain.com is actually not the same database that they use for live searches.

So you may see a certain number of pages using the site: command but not or less pages may be indexed.

If you want pages indexed, out then in an xml sitemap file, make sure there are internal links to them on your site, and external links from other sites really helps. Third party indexer tools help as well.

fasterik · on June 5, 2023

Google results have become so bad that I use "site:" for a majority of my searches these days. I have a bunch of Chrome search engine keywords set up so that I can go straight to results on Wikipedia, Economist, Reddit, Stack Overflow, Cppreference, etc.

It's concerning that they're even nerfing site search, which seems like a core feature for a search engine. You could argue that Google isn't really a search engine any more, but rather a general knowledge engine and advertising platform. I hope somebody can build an alternative to Google that does what a search engine is supposed to do, i.e. index the web without all the extra garbage. But maybe SEO has killed that dream at this point.

ren_engineer · on June 5, 2023

> there's increasingly huge amounts of information out there on sites that Google may have crawled before and knows about, but doesn't want to show me for some reason

this is some machine learning stuff they are doing, instead of indexing all the specific keywords they are creating vector embeddings and basically summarizing what's on the page and going on similarity to your query rather than specific keywords. Good for casual searches, but extremely annoying for power users

nohuck13 · on June 5, 2023

"Add to that the automatic CAPTCHA-hellban you get if you use "site:" in anything more than a tiny amount"

Source? This would be worrisome.

Hard_Space · on June 5, 2023

Anecdata, but I can confirm a uniform and long-standing experience that adding colon-based operators to a search query results in a CAPTCHA challenge every single time on a subsequent search, even if the subsequent search is 'vanilla' (i.e., no operators). Has been like this more years now than I can remember. Apparently this kind of 'advanced' usage is indication of bot activity.

foobazgt · on June 5, 2023

I have never had this experience once in... decades? I use operators such as site: frequently. I suggest there's some other property of your environment that's setting captcha off - vpn, shared sketchy ip/network, etc. Bad actors suck.

buggyipadmettoo · on June 6, 2023

Same. Use it everyday for majority of my G searches, never once seen a captcha (except when using VPN). OP, are you logged in to google? I am. Wonder if that’s the difference?

raverbashing · on June 5, 2023

So now anyone displaying slightly more intelligence than an eggplant while doing a search in Google is a "bot"?

Appalling

freedomben · on June 5, 2023

welcome to the machine learning future, where anything you do that is a statistical outlier gets you algorithmed by a machine that is incapable of reason but knows when you're different.

As a person who has been a statistical outlier most of my life, I am dreading this. It's bad enough dealing with human impressions and mis-judgment, but now we get it from our computers now, which used to be logical, deterministic havens.

pixl97 · on June 5, 2023

>As a person who has been a statistical outlier most of my life

Anomaly detected. Termination authorized.

CatWChainsaw · on June 7, 2023

All humans must be reduced to sameness so machine successors can flourish.

29083011397778 · on June 5, 2023

Appalling what that says about Google, or what that says about the average search user?

Atlas22 · on June 5, 2023

C) All of the above

brazzledazzle · on June 5, 2023

For what it’s worth this never, ever happens to me. These days I only get captcha’d when someone’s laptop on the same network gets owned and is being used to hit google.

chefandy · on June 5, 2023

Hmm... VPN, big proxy, or some other contributing factor? I use site: all the time, not on chrome, and without being logged in... If I've ever gotten captchas doing so, it wasn't frequently enough to see a pattern. Maybe some property of the site makes a difference that puts your usage and my usage on either side of that fence?

Hard_Space · on June 7, 2023

> VPN, big proxy, or some other contributing factor?

My big crime is that I live in Romania, I think.

chefandy · on June 7, 2023

:: Google shaking its fist in the air ::

Damn Romanians!

Hard_Space · on June 8, 2023

> Damn Romanians!

Not even Romanian! (Brit transplant)

chefandy · on June 11, 2023

Everybody adopts the nationality of their IP address in internet land!

tyingq · on June 5, 2023

Anecdotal, but this happens to me a lot, and not just with the "site:" operator. Generally using any of the advanced operators seems to set it off. Things like inurl:, intitle:, etc, trigger it also. Not every time, but after a few times. From a normal ISP connection, no VPN, even while logged into Google, etc.

pteraspidomorph · on June 5, 2023

I personally have been surprised to find myself CAPTCHA'd out of google search recently. No idea what's up with that. Regular commercial ISP, no VPNs.

EvanAnderson · on June 5, 2023

I've never gotten a CAPTCHA-hellban that I know if, but I absolutely get a CAPTCHA when I use "site:" for more than just a couple searches. (It sounds par-for-the-course w/ Google, though...)

josephcsible · on June 5, 2023

FWIW, I've personally experienced exactly that happening too.

metalforever · on June 5, 2023

Yeah, I get these. The problem is that the Captchas take forever to fill out (like 5 minutes of challenges). But the worse part is that the captchas are asking for wrong answers. It tells you to select scooter and there's no scooter in the photo but it thinks there is. So you just end up stuck in a captcha loop for a long time.

I am not sure why I get them but it might be due to using anti-fingerprinting tools.

freedomben · on June 5, 2023

I've wondered if it isn't intentionally impossible to solve, because "the algorithm" decided that you're a bot or malicious and they want to spin your cycles endlessly. The affect on me know is I won't even try anymore, I'll just take a different route. That may even reinforcement teach the system that I was a bot that couldn't solve it

metalforever · on June 5, 2023

I think it's more malicious than that. They know I use privacy tools and can't be tracked -> they can't make money on me -> bully me into not using their service.

Atlas22 · on June 5, 2023

It may also be part of their anticompetitive war on other browsers. I get captchas constantly in a new default Firefox profile, but not in a new default chrome profile. Spoofing user agent to recent chrome agent in Firefox makes the captchas happen far less often for me.

privacyking · on June 5, 2023

I sometimes get multiple captchas in a row that I fill correctly but they keep on showing more..I then just do the audio one which works.

20after4 · on June 5, 2023

This is probably the common thread among all the people reporting this. As an alternate date point, I haven't experienced the captcha from using advanced search queries.

mavhc · on June 5, 2023

bots use it to find websites with security flaws I assume

GolfPopper · on June 6, 2023

>now it's nearly impossible for anything but the most vapid of queries

I've noticed that myself, looking for very precise content, which I know is out there but failed to bookmark. (Most recently, for amateur astronomy and roleplaying.) The solution to finding niche stuff now seems to be digging through relevant reddit or forum threads, hoping someone posted a link to it.

topicseed · on June 5, 2023

Very true. Some client websites have had half their keywords gone from position 1-3 to deindexed, then back, then gone, then back, and that's been since February 2023.

dontupvoteme · on June 5, 2023

>Add to that the automatic CAPTCHA-hellban you get if you use "site:" in anything more than a tiny amount

Is it more expensive, or do they just wish to prevent people from being able to cache their own results locally?

Gasp0de · on June 5, 2023

I assume it is so that websites don't abuse it to build search boxes for their own sites without showing ads?

E.g. I can build a searchbox on mywebsite.com, and if you type "hamster" I'll just query google for "site:mywebsite.com hamster" and return the results to you. That way, my site can be static but still have a search box, and google has all the work but gets no money.

bombcar · on June 5, 2023

I suspect it bypasses some advertising metric and they don't like it.

privacyking · on June 5, 2023

Startpage does the same thing when I use sitebut with no captcha to bypass the hellban. Sometimes it just shows no results intentionally. Refreshing the page fixes that one

gniv · on June 5, 2023

It should actually be cheaper to run. Much cheaper, since the site operator acts as a restrict on URLs.

narrator · on June 5, 2023

They are probably trying to reduce "misinformation" by removing most of the web from their index. With AI, they could just ask bard, "does this website contain any information that would be considered misinformation?" and then just ban it.

If you want "misinformation," or to just search the web like it's the mid 2000s, you can use http://Yandex.com. They do a pretty good job on controversial queries. Google has gotten so political that they even have this "results are changing rapidly" page they return when there's been some new political hot topic that they haven't gotten the commissars at headquarters to weigh in on yet as to what's going to be the official narrative.[1]

[1]https://www.theverge.com/2021/6/25/22550430/google-search-re...

account42 · on June 6, 2023

Yandex is also censored but in the other direction. Probably does censor less than Google but enough that you shouldn't rely on it alone for topics that involve Russia. Their index is also limited in general when it comes to non-russian content. But it does return many things that Google would rather you not see so it is invaluable if you want to get the full picture.

Yandex image search also has an infinitely better interface, linking your directly to image sources and not being full of links to sites that want you to sign up before showing anything like Google is. It's still not perfect and IME often groups images to agressively which effectively hides "similar" results.

Atlas22 · on June 5, 2023

Nice try FBI. In all seriousness though, has it actually gotten so bad that yandex of all search engines is less censored? Or is it just less censored when it comes to topics controversial to the US (and not russia)? The fact that so much censoring is going on that google has a "hold on while we censor this" page is insane.

philsnow · on June 5, 2023

> google has a "hold on while we censor this" page

a what now?

jamesfinlayson · on June 6, 2023

I'd never noticed any issues with Google until a few months ago where I was googling an exact phrase that I knew appeared on one site. Google gave me nothing but DuckDuckGo found it.

The site is probably 20 years old and has no SSL, but still... giving me no results is worse then giving me the one correct result.

baremetal · on June 5, 2023

>and I realise that there's increasingly huge amounts of information out there on sites that Google may have crawled before and knows about, but doesn't want to show me for some reason.

I wonder what that reason could be.

pixl97 · on June 5, 2023

It's expensive?

Why provide the best product when you only have to have a product slightly better than your competition. After that everything is profit.

Couple that with a huge portion of new sites seem to be bot generated shit that's copied from other places on the internet it seems Google has given up on the open web.

moffkalast · on June 5, 2023

Not sure why they'd care, they have effectively infinite money from adsense.

marcosdumay · on June 5, 2023

As long as people don't abandon search, yeah, they do. If they lose their absolute dominance in search, they will automatically have competition on adsense too.

Or maybe Google disagrees with my assessment, but I can't imagine what kind of inside information would make them do that. It looks like a very clear and inescapable reality to me.

pixl97 · on June 5, 2023

You should know that's not how capitalism works. They have to keep making more money per dollar every year or they get punished in the market. They've tapped out on their limits of growth and now actual costs are increasing due to floods of automated crap at levels far beyond what we had in the past.