Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I can operate Google, I can find anything

Or the collorary "if I can't find it on Google, it doesn't exist" - which seems to be the perception among people these days, and in some ways it's quite scary how much power Google has over what information people can find on the internet. I've noticed a decline in the diversity and breadth of their search results over the years, where sites that I used to visit containing detailed technical information - many of them are still around - have basically disappeared from the search results, overtaken by highly SEO'd sites that have only superficial information (a lot of the time they contain words that only approximately match some of the query, which makes it even worse.) It seems that in attempting to prevent spam, a lot of the genuinely good content (that just "wasn't SEO'd enough") has been buried too. The mundane, shallow, and practically worthless content is emphasised over the detailed, in-depth information that I believe certainly exists out there. If the internet was shit in 2003, it's even more shit now.



Most of those high-quality sites that contain detailed technical information have something in common: they don't carry Google's advertising, so Google earns no revenue when you visit them. Despite Google's protestations over the last few years, I haven't heard a better explanation for the shift in the nature of the results they return.


In reality, the issue is simply that the sites that contain in-depth writings don't get updated as often as the blogs, forums, and content farms that contain superficial information. Google's engines are tweaked to always prefer sites that are updated very regularly, so as to sift out obsolete information.


Why is this? In many fields, information does not become "obsolete".

Google's policy on this issue seems to be pushing the internet in a superficial direction.


Google is a tech company, tech is seemingly outdated as soon as it arrives. They push the tech line of thinking everywhere they go, even if it doesn't belong.


Well to be fair, trying to differentiate between information which does become outdated and information which is static is non-trivial to do algorithmically. Choosing between promoting new and updated information and promoting static information I think the former is a better choice for most things.


I quite clearly remember seeing Google ads on many of these sites, as they were unintrusive enough that I didn't block them. My theory is still "not SEO'd enough", or perhaps due to the huge amount of text and links that they tended to contain relative to styling elements, they appeared linkfarm-ish enough to Google's algorithms to get penalised.


Or SEO'd at all. It's easy, and maybe appropriate, to hate on SEO, in part because it works. If you have a bunch of people pouring money, thought, and energy into optimizing crap content, and many people producing high-quality /niche/non-commercial content giving it little to no thought, it's not surprising that the crap floats to the top. It doesn't require malfeasance on Google's part.

The other thing to consider is that Google has to optimize for the general case. If their mission was The Best Physics Search Engine or The Best Academic Search Engine, they might do a better job with more esoteric material. But it's meant for everyone, and most people want less detailed, more digestible content.


I have yet to see any evidence for this and it's a wild accusation that I'm seeing a lot on HN without merits. Occam's razor applies here as well, with the simplest explanation being that search is a difficult problem to solve. And I've written here before that for non-US users at least, Google's results are vastly better than everything else, so it isn't that competition doesn't exist, but hey, you're free to try and solve this problem in a better way.


Evidence would be difficult to offer given that their algorithm is a changing black box.

I also don't see how "search is hard, which is why Google's search results have issues," is a simpler explanation than "Google has some bias towards results that they profit from."


Why not both? Search is really easy if you have a corpus of static information. Search is very hard when you have a huge amount of information that changes, many people are looking for the newest information, and outside influences want to bias search in their favor for profit too.


By any chance, could you share any links to those useful sites that no longer appear in Google's results?


Well, that's the problem with having an ad broker serve as the primary user interface for the web.

I'm honestly not sure how that problem gets solved. In theory, it just needs a collaborative low-friction curation scheme, each of whose contributors has both deep knowledge of one or more fields, and the Google-fu to find worthwhile information about them. Indexing and staleness aren't particularly hard problems; in the former case, Google itself can probably serve the need, and in the latter, the Internet Archive will amply suffice. Dealing with bad actors, on the other hand, strikes me as a highly intractable problem.

Actually, the more I think about this, the more it seems like it might be worth pursuing further -- certainly at the very least it'd be preferable to the haphazard collection of bookmarks, spread across three different machines, in which currently reposes my collection of substantive information on a variety of technical subjects related to my hobbies and occasional pursuits. The technical side seems relatively straightforward. The user side I'm not so sure about. How do I make sure the information is good? How do I filter out incompetence and malice on the part of curators?


Or more scary (and tin-hat-y) how Google can effectively choose what does or does not "exist" on the net, by not making it show up in results.

It's already doing that via DMCA requests.


You ascribe to Google an agency that belongs to another here. DMCA requests are not Google's choice. They are the choice of another that Google has no choice but to comply with in a certain way by law.


Google very clearly marks when a result it removed due to a DMCA request.


Doesn't make it less non-existent.


It does. If you click through to the DMCA takedown request, there's a list of removed pages. It's more annoying to use, but the information is still there.

(For what it's worth, I'd prefer if the results weren't removed at all, but the current legal environment doesn't allow for that.)


I'm surprised Hollywood would let Google get away with linking to DMCA requests that contain those links. Mind confirming this is what happens?


Yes, it looks like this:

  In response to a complaint we received under the US Digital Millennium
  Copyright Act, we have removed 1 result(s) from this page. If you
  wish, you may read the DMCA complaint that caused the removal(s) at
  ChillingEffects.org. 
The text "US Digital Millennium Copyright Act" links to https://www.google.com/support/bin/answer.py?answer=1386831, and the text "read the DMCA complaint" links to the DMCA request. The DMCA request includes a list of the removed URLs.

If you're interested in seeing one of those requests, go to https://www.chillingeffects.org/ and search the name of a media company.

If you're interested in seeing the message in a Google search results page, try searching "[popular US show] episode 10" and scroll to the bottom of the page.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: