Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FYI that your site is blocked by this list: https://gitlab.com/The_Quantum_Alpha/the-quantum-ad-list

HN post for that list here: https://news.ycombinator.com/item?id=25512273



That list is questionable at best.

There are many claims the list author makes without any source code at all, though a lot of buzzwords. The reddit r/pihole moderator pulled the post: https://www.reddit.com/r/pihole/comments/kh5dit/the_quantum_... . The thread was more entertaining before the list author deleted every downvoted comment they made.


[0] is perhaps even more concerning - apparently it bears a striking resemblance to Steven Black's (slightly more reputable) list[1] [edit: plus a few hundred thousand other rules of questionable sourcing].

[0] https://gitlab.com/The_Quantum_Alpha/the-quantum-ad-list/-/i...

https://github.com/StevenBlack/hosts/issues/1487

[1] https://github.com/StevenBlack/hosts


I agree that it's questionable. I commented the same in the thread I linked: https://news.ycombinator.com/item?id=25513161

However, at least for Pi-Hole users, more is usually better, so I added the list to my Pi-Hole.


> > We were testing an AI that could show some basic emotions about internet content, and turns out it was very precise at getting “annoyed” by ads and “unsolicited” third party connections…

Holy shit that's such bullshit.

They are basically claiming they invented a artificial general intelligence, with feelings, that happens to feel the same way about ads as us. It's basically sentient instead of publishing research papers, they turned it into an ad blocker.


It's just colorful language for the fact that ads and spyware score high on their model for bad websites.


First: Marketing bullshit is still bullshit.

Even if it's not morally wrong, it makes you look like an idiot who doesn't understand the technology you are selling. In the worst case it might even be used as evidence that your work is a fraud.

There is no benefit; To the lay person, It would sound just as impressive to say "We trained a machine learning model to detect ads and spyware" and that wouldn't immediately set off alarm bells with people familiar with the current state of machine learning.

Second: Talking about fraud, the evidence linked above is pretty strong.

Their alleged AI is somehow detecting test domains that authors of other lists as "ads or spyware". Test domains that aren't linked anywhere on the internet.

In one "smoking gun" example, the test domain doesn't even have a DNS entry. The alleged AI can't even load the domain to scan it.


No, more is not usually better. Especially with a garbage ""AI-generated"" (not) list with untrustworthy maintainers like this one. It's better to add a low number of lists with trusted maintainers, who actively curate their lists and respond to false positives. That means no "mega-list" abominations like oisd.nl.

I suggest: https://www.github.developerdan.com/hosts/

https://gitlab.com/curben/urlhaus-filter/raw/master/urlhaus-...

https://raw.githubusercontent.com/notracking/hosts-blocklist...

https://raw.githubusercontent.com/anudeepND/blacklist/master...


Can you explain why more is not usually better?

I added the 4 lists you recommended to my Pi-Hole, which added a net new 73,253 domains to my Pi-Hole. My total is now close to 2M.


You could just blacklist *.com and be done with it.


You joke but I would be most happy if all my web needs could be served on .onion addresses


Hm, well I've got to work out how to get off that list! Thanks for giving me the heads up.

EDIT: I'm not sure quite how to deal with being put on ad lists. Sure, people can upload any file to our host so it's plausible that someone, at some point, has uploaded an advert. Someone could also redirect to an advert domain and we'd have no way to really deal with that unless it was reported. Ideas are welcome for solutions.


Just some thoughts:

1. Reach out to the list maintainer to see why your site was added.

2. Create a blocklist comprised of those ad lists. Don’t redirect to sites on the blocklist.

3. (Of dubious practical value) Create a Terms of Service that says users may not use your to link to advertisements.


+1 to the second suggestion as a low-effort way to make some headway in staying off blocklists.

A place to start might be this large, very popular list that combines a bunch of other lists: https://oisd.nl/

Actual text file is here (large file warning): https://hosts.oisd.nl/

Just prevent your service from shortening links to any of those domains.


You might want to consider checking for hosts listed in https://github.com/notracking/hosts-blocklists

This is an excellent merged blocklist, with public whitelist (oisd is fully closed, no insight in what is whitelisted and why, also causing more false positives..)


No longer the case: https://oisd.nl/excludes.php


Right on time, sjhgvr can't allow his rep to be (rightly) blemished on any corner of the internet.


> 3. (Of dubious practical value) Create a Terms of Service that says users may not use your to link to advertisements.

That seems entirely unenforceable. Aren't ALL websites ultimately advertisements?


> Aren't ALL websites ultimately advertisements?

No. Some are just information, art, or what-have-you. Here's one I just found now.

https://aaron.axvigs.com/


That could still be considered an advertisement of his existence and writing skills.

If the goal is purely informational, why is the author's name attached?

The site also advertises the CMS it runs on.

That's my point, by a reasonable standard, ANY site that exists is an advertisement for something or other, thus a rule saying "no linking to advertisements" is worse than useless.


This must be the mindset it takes to work in the ad tech industry.

Ads are sort of like porn. There are lots of things you certainly know serve no other purpose than to advertise something and you can block them outright. Native advertising is certainly difficult though.


Do not work, nor I have I ever worked, in ad tech.


I guess you have a different understanding of what "advertising" is than the general understanding.


advertising or ad·ver·tiz·ing [ ad-ver-tahy-zing ] - noun - the act or practice of calling public attention to one's product, service, need, etc.


I believe it's possible for a website to exist without calling attention to anything.

Or perhaps you believe the mere existence of information is a call for attention.


Doesn’t all content exist to receive attention?

I think there would be exceptions, like test sites, personal experiments etc. that could make it on to the internet without seeking attention, but any content designed for consumption is attention-seeking.


> Doesn’t all content exist to receive attention?

Maybe. Attention can also be granted without it have been called there. There are also websites not designed for consumption.

If every website is advertising, then surely most of human discourse and activity would also be considered advertising. What's even the purpose of the word?

You're not going to convince me that everything is an ad, and I probably won't convince you either. I'm not interested in playing any further semantic word games. I'll read any replies you make if you choose to, but I have nothing more to offer in this thread.


I agree that not everything is an ad. I think the parent comment is fairly trite.

I do believe all content made for consumption (even purely informational content) is attention-seeking.


For me the problem is that you hide URL's that I can click on and have no idea where I end up. So I block all url-shorteners as a principle on my pi-hole.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: