Show HN: Find Your Hacker News Doppelgänger

wheybags · on June 20, 2021

My #1 match was an account[1] banned for "posting unsubstantive comments and repeatedly breaking the guidelines". Now, I may be biased, but I don't think that's accurate :v

1: https://news.ycombinator.com/threads?id=franciscrick1

brudgers · on June 20, 2021

I suspect most HN accounts are low Karma. Considering how often I see throw aways, probably by several orders of magnitude.

Relying solely on co-sign similarity, every vector is likely to be surrounded by the vectors of low karma accounts.

Or no matter which direction you travel from earth, you will almost certainly be surrounded by vacuum.

Retric · on June 21, 2021

I don’t think it’s that random. Out of the top 5 accounts I was linked to 4 where over 8k, and 2 where over 20k karma.

brudgers · on June 21, 2021

Look at dang's top five matches.

qubex · on June 22, 2021

Aren’t the population statistics of the NH karma distribution known? Histogram, percentiles…

jjcc · on June 20, 2021

I did the search on my self trying to find similar souls share the passion about Godel theorem, viewing the current carbon-based civilization from the views of silicon-based civilization or the alien's, functional programming... But none of them are even close. This #1 match has some views I'm totally not familiar with but I have an opportunity (which I appreciated) to understand other views

In my opinion, this service doesn't have a good S/N ratio. Could give you irrelevant information.

DyslexicAtheist · on June 20, 2021

perhaps your interests are really unique ;)

srean · on June 21, 2021

With high enough dimension, almost everyone is.

47 · on June 20, 2021

I think it is correct as Doppelgänger are supposed to be the evil version of oneself.

ant6n · on June 20, 2021

Username checks out https://memory-alpha.fandom.com/wiki/47

technologia · on June 20, 2021

I got a bunch of empty accounts and one startup launch post. Not sure how that qualifies as a doppelgänger

starkd · on June 20, 2021

I have to agree. Nothing stood out as being any way similar. It's hard to tell what their measure of similarity is here. This might be a case of let's just throw the data in, and see what comes out.

runawaybottle · on June 20, 2021

It caught users that use my style of dumping a ‘in line rhetorical saying’ in their posts. It’s not terrible, you’d have to laugh honestly at how predictable you are.

I’m gonna marry one of my doppelgängers.

proactivesvcs · on June 20, 2021

I'm guessing they came down on the wrong side of belts vs. bots?

Sr_developer · on June 20, 2021

[flagged]

dang · on June 20, 2021

That is doubtful. We don't shadowban established accounts: we tell them why we're banning them and why: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que.... Shadowbanning on HN, at least for the last 7 years, has been reserved for spammers and serial trolls. It's possible that we made a mistake and neglected to tell you, but it's far more likely that we did tell you.

We don't ban people for criticizing PG, as anyone can easily see for themselves by using HN Search or looking at any recent thread from paulgraham.com.

If you're going to make a claim about why you think you were banned, you should provide a link so readers can make up their own minds. When it comes to "I was banned" stories, people say all kinds of things, most of which don't hold up against the actual record.

Sr_developer · on June 20, 2021

Here it is:

https://news.ycombinator.com/item?id=26656133

You banned me 2 days after the comment for saying that YC can be as or more exploitative than China's deals with African countries and I said that the parent defending YC but condemning China was acting on a base of racism and ethnocentrism. You can see how he brought a bunch of low-quality links I refuted them and you came 2 days later(I just realized that now) and banned me.

Oh well, I suppose you will be ban me again. BB cannot be criticized.

dang · on June 20, 2021

That doesn't link to a banned account. I assume you mean this one: https://news.ycombinator.com/item?id=26659584. We told you we were banning you in that very thread: https://news.ycombinator.com/item?id=26678745. Comments like "I must conclude you are a dumb person letting his latent racism to take over or you are aware you are acting on bad faith and you just dont care" are obviously against the rules here and have nothing to do with PG, YC, China, or any particular topic.

Moreover, we'd warned you and asked you may times to follow the site guidelines before that:

https://news.ycombinator.com/item?id=26127670

https://news.ycombinator.com/item?id=25637111

https://news.ycombinator.com/item?id=25400449

https://news.ycombinator.com/item?id=24909805

https://news.ycombinator.com/item?id=24513786

https://news.ycombinator.com/item?id=23087338

https://news.ycombinator.com/item?id=18102477

If you break the rules that often and ignore that many warnings, it's not surprising that you'd end up getting banned. This was not a shadowban and not because you criticized some particular person.

All this is a pity because you posted many interesting comments in the past and we would much rather have you as a contributing user. The sad truth, though, is that the harm you cause by breaking the site guidelines exceeds the good you contribute with the interesting comments—so I don't think we made the wrong call.

Sr_developer · on June 21, 2021

As Cardinal Richelieu apocryphally said"If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him."

I dont doubt you can present a similar set of "warnings" for any undesirable who refuse to toe the corporate line you are ordered and paid to maintain. All those rules are ambiguous, opaque, arbitrary and subjectively interpreted and enforced, but you know that.

The most ironic thing is that the measure you take with people who refuse to follow you faux-polite tone is censorship and virtual obliteration, something 1000 times worse. Most of the people I got personal (in argument) here were being racist/clasist/homophobic/ethnocentric but since they can shield themselves behind rhetorical what ifs and sing paeans to the geniuses of YC they have carte-blanche.

Give me a million times better the dysfunctional governments we all have before techno-fascists like you and your bosses, censoring anyone who does not suck up to them.

dang · on June 21, 2021

Not to contradict the Cardinal but your accounts have broken the site guidelines a lot more than the median commenter, and we really don't care about your views. Plenty of other commenters express similar views without getting banned. Actually we really, really, don't care. We're just trying to have an internet forum that doesn't suck.

It isn't about politeness, btw (let alone "faux" politeness) – you won't find that word, or that concept, in the site guidelines. It's about treating other people respectfully, and abstaining from garden-variety internet dreck. Let's not noble up the latter with self-flattering rhetoric.

brudgers · on June 20, 2021

A comment about an account being banned in the linked thread. https://news.ycombinator.com/item?id=26678745

Not sure it is applicable since the linked comment’s author does not appear to be banned.

ur-whale · on June 20, 2021

Think of minority report and it'll all become clear.

Supermancho · on June 20, 2021

I have 2 accounts at over 1k karma. I generally start a new one when I make major moves (across multiple state lines or between countries).

My accounts did not correlate, probably because they have been inactive at staggered intervals.

> We took usernames and respective comment histories from the past three years

However, putting the names in the Doppelganger search yielded very similar results and the comments of the users are from like-minded people. Well done.

magnio · on June 20, 2021

For some users I searched, the most similar user is a throwaway account, which is somewhat eye-opening and unnerving.

notRobot · on June 20, 2021

I was just about to point that out! I tried a friend's account and the second match was a throwaway that I know for a fact is theirs.

gk1 · on June 20, 2021

That’s a really interesting and unintended use case…

sohei · on June 20, 2021

How did you gather the comment histories? Would you mind sharing a copy?

gk1 · on June 20, 2021

See description at the bottom. We used the Hacker News API to pull data into BigQuery.

From there we ran them through an embedding model and indexed the embeddings in Pinecone.

The actual similarity search is done with Pinecone. (https://www.pinecone.io)

busymom0 · on June 20, 2021

Using Google BigQuery is one way. This comment might be of use:

https://news.ycombinator.com/item?id=25075318

> A reminder that BigQuery (as used in the query in this link) is the best way to play with Hacker News data; don't scrape HN data manually! The `bigquery-public-data.hacker_news.full` table appears to be up to date with the most recent HN data as well (table last updated today). However, I'm not 100% sure the query is correct for unilaterally getting all links, as running the query on the full dataset returns the same results as running it from 2006-2015. And I value my sanity enough to not fuss around with the regex.

JimDabell · on June 20, 2021

Two of my matches were <username> and <username>2.

Darvokis · on June 20, 2021

Huh, checked some of my older accounts and none of them matched each other. So I must be doing something right.

aasasd · on June 21, 2021

In fact, considering that unknown third-parties freely gather such similarity scores and correlate accounts, across different sites—by now it's a given that one's alt accounts have to adhere to different stylistic choices.

Innit?

briefcomment · on June 20, 2021

Wow, this works seriously good at some level then.

brudgers · on June 20, 2021

I think the weakness of this technique is in the normalization of the vectors. The close match comments don't look like mine because the content of my comments has to be massively compressed. The close matches appear to have been massively expanded.

Or to put it another way [1], cosign similarity is not enough here. Magnitude also matters here.

This is probably a case where traditional information retrieval methods should play some role. The data are not really big enough that a pure cosign similarity is warranted. [3]

[1]: a phrase that my actual Doppelganger must use. [2]

[2]: and also endnotes like these.

[3]: performative erudition is what is absent from all my matches.

jointpdf · on June 20, 2021

(cosine, not cosign)

JustResign · on June 20, 2021

Hey, they said it was _performative_ erudition

corobo · on June 20, 2021

If you get me and have always wondered why you never quite fit in, ask your GP about ADHD.

While folks are saying people they get match up with them in comment style, keep in mind due to the nature of the tool they're looking for that. Also look for comments and opinions you're not similar in to disprove the match.

I once made a Markov chain IRC bot that people would still be convinced was smart today because people discard the lines that make no sense when only looking to prove rather than also disprove

1f60c · on June 20, 2021

> GP

At first I thought you meant "grandparent" in the HN sense.

Doxin · on June 21, 2021

For those still confused: GP stands for General Practicioner. A medical doctor that isn't specialized.

loves_mangoes · on June 20, 2021

I have a throwaway Doppel whose only comments are about diet coke.

Given that this is my 'pharmacology' alt account, it seems the author's pretrained word embeddings still associate Coca Cola with the old recipe =)

NLP is hard!

susam · on June 20, 2021

This is fun! Here are the results I got:

  | Username        | Similarity Score |
  |-----------------+------------------|
  | tosh            |            0.939 |
  | app4soft        |            0.931 |
  | beefhash        |            0.930 |
  | joseluisq       |            0.929 |
  | todsacerdoti    |            0.929 |
  | pjmlp           |            0.928 |
  | rbanffy         |            0.928 |
  | blattimwind     |            0.928 |
  | formerly_proven |            0.928 |
  | ducktective     |            0.928 |

I identified three usernames in this table right away! tosh, todsacerdoti, and pjmlp. In fact, I like the stories posted by tosh and todsacerdoti quite often and I like the comments posted by pjmlp very often.

mancerayder · on June 20, 2021

I'm probably late to the party, but is there a reader app people use for following specific people? Or are you just referring to whom you've favorited / following and have a good memory for names?

susam · on June 20, 2021

The latter. I remember those three usernames.

ant6n · on June 20, 2021

Wow, u recognize ppl on hn? I only recognize ppl I know In real life, everybody else is kind of anonymous.

dijit · on June 20, 2021

My number 1 doppelgänger is a Swedish person.

I’m British and I’ve been living in Sweden for 7 years now, and it’s only just occurring to me that this could be affecting the way I form comments.

DyslexicAtheist · on June 20, 2021

are you using the "bork bork bork" extension to form comments perhaps? http://www.snert.com/Software/bork.html

Cybotron5000 · on June 20, 2021

This is fantastic! Thank you! :D (…schmerk de herdygerdy, bork, bork, bork…)

mdaniel · on June 20, 2021

I think the poor service is swamped, but it isn't doing itself any favors by hammering the /healthz and /status endpoint about once or twice a second

benrbray · on June 20, 2021

I'd be more interested to see a list of users who have commented the most on the same articles as me. Seems like a better way to measure interests, even if it's indirect. Of course, this wouldn't distinguish between doppelgangers and evil twins >:)

achow · on June 20, 2021

Comments are often reply to someone else's point on things which may not be related to the original post.

jcims · on June 20, 2021

Some of my favorite HN threads remind me of bubble tracks in old school particle colliders. Just stacks of tangents that are hilariously off-topic but somehow, at times, still interesting and even informative.

benrbray · on June 20, 2021

One idea is to rank users more highly when their comment is closer to mine in the comment tree. Also to weight users more highly when we both comment on an article that has relatively few comments.

qart · on June 21, 2021

Yep, my doppels all seem to have participated in topics and threads that I ignored. We seem to have never interacted with each other, and I don't seem to have voted any of their comments.

_Microft · on June 20, 2021

If you encounter the error that "This user does not exist or does not have any activity.", check the case. They seem to be case-sensitive on this page.

(I checked with correct and incorrect spellings of "dang", "pg", "TeMPOraL" and "_Microft".)

morsch · on June 20, 2021

Yup, didn't work for me at first because it's case sensitive. Also, you're one of my doppelgangers, hello.

_Microft · on June 20, 2021

Hello, hello, nice to hear that - I'll make sure to look through your comment history later!

Salgat · on June 22, 2021

Didn't work when I fixed casing until I refreshed the page.

oefrha · on June 20, 2021

So my HN doppelgänger according to this tool is someone I don't hold in high regard, and even clashed with once. Cool...

k__ · on June 20, 2021

Why not.

In the past I often had the impression I wouldn't get along with myself.

Now, that I'm more chill and less confrontative, I think I would like to meet myself.

beebeepka · on June 20, 2021

I have unintentionally trolled myself more than once due to people necroing old forum threads.

Yeah, I talk a lot of shit but I've got nothing on my old self.

1123581321 · on June 20, 2021

Hah, I’ve thought the same thing. I think I’d enjoy working with myself quite a bit. As a housemate, though, I doubt we’d cross boundaries often. Very thankful to be married to someone who truly complements me.

edflsafoiewq · on June 20, 2021

Well, to meet your doppelgänger is an ill omen after all.

slavik81 · on June 21, 2021

If you recognize your doppelganger, it's probably because you have an interest in the same topics. Your opinions and your posting style may be different—or even opposite—but from a certain perspective, you have more in common with each other than most.

I'm skeptical that this tool does a good job of identifying semantic meaning of a comment, but I bet it gets the topic right.

sdenton4 · on June 20, 2021

This is a nice case study on why it's good to predict distributions instead of raw values. Low comment accounts should have a high degree of uncertainty, which should translate into weaker similarity scores, of you compute the expected similarity of two accounts.

asdfasgasdgasdg · on June 20, 2021

Eep. My doppelganger is a banned user who posted too many gender flamewar-baiting comments. Let it not be so!

_gohp · on June 20, 2021

I get a blank screen with a link to the hosting website on the bottom right. Chrome on Android mobile.

gk1 · on June 20, 2021

I let them know, they’re trying to keep up with the load.

Edit: Should be better now.

jbverschoor · on June 20, 2021

6gvONxR4sf7o · on June 20, 2021

I’d love to see stats on how well this matches accounts to themselves if you split an account’s comments into two pseudo-accounts and tried to match them.

beebeepka · on June 20, 2021

That makes a lot of sense as control/sanity check. Then again, I don't think I always have the same style due to mood differences and whatnot.

6gvONxR4sf7o · on June 21, 2021

If you split it randomly, the distributions over mood and style should balance out in the large.

rkagerer · on June 21, 2021

Well, the first comment I saw from my top doppelganger match began "I have been one of the toxic persons in a workplace. I was young and immature and an ungrateful arrogant prick..."

Reviewing additional posts I don't think we seem any more alike than a random pick.

MattGaiser · on June 20, 2021

The first account I found was banned, but the 2nd account found says a lot of the same things I do. I would send them in my place in a debate if I could not make it.

Well done.

croisillon · on June 20, 2021

I don't comment much, my next-to-last comment were 6 words including the word "underpaid" and now my "Doppelgänger" are all comments who had "underpaid" in their last comment...

Tainnor · on June 20, 2021

It seems I have some overlap with qsort. Given the posts I have seen from that account, I take that as a compliment.

I do wonder though if the model is smart enough to correct for when you quote other people, otherwise it might be measuring who you interact with much.

BTW, I'm not sure of the privacy implications of this, maybe someone else can comment on this.

mstade · on June 20, 2021

Ran it with my name and the top hit was nzeribe, which led me to this amazing thread: https://news.ycombinator.com/item?id=27087795

And here I was living my life thinking that Soft Cell's version of Tainted Love was the original...

StefanKarpinski · on June 20, 2021

Is this in any way distinguishable from just picking random accounts? There is no discernible similarity between the supposedly similar accounts that I can discern.

easton · on June 20, 2021

It works! My doppelgänger is my old account.

surfsvammel · on June 20, 2021

Doesn’t work on iPhone? Or just doesn’t work for me:(

masswerk · on June 20, 2021

The service is apparently overloaded. I'm falling into a connection timeout with every browser and it takes ages to even proceed to this.

annoyingnoob · on June 20, 2021

Hug of Death

https://i.ytimg.com/vi/dqwXjP5Qk6E/hqdefault.jpg

gk1 · on June 20, 2021

Try now, it was updated to handle the load better.

catern · on June 20, 2021

I'm pleased to see that while most people have doppelgangers similar around >0.990, my most similar doppelganger has a similarity of 0.975. I'm a unique individual!

onychomys · on June 20, 2021

0.957 here. I'm twice as unique as you are!

wizeman · on June 20, 2021

0.939 here. I wonder who has the lowest top similarity score (pg has 0.88)!

jointpdf · on June 20, 2021

0.849 for me

DyslexicAtheist · on June 20, 2021

You're all individuals https://www.youtube.com/watch?v=QereR0CViMY

boomskats · on June 20, 2021

0.866. Feels lonely :(

riidom · on June 20, 2021

.886 here, let's feel special instead :D

busymom0 · on June 20, 2021

I tested with dang's username and I really don't see the similarities between the comments on:

https://news.ycombinator.com/threads?id=dang

and

https://news.ycombinator.com/threads?id=porphyrogene

or

https://news.ycombinator.com/threads?id=julianeon

Your site does state:

> It compares the semantic meaning of your comment history with those of all other users, and finds the top ten users whose comment histories are most similar to yours.

So maybe it's comparing comments from entire history of the account and not just the recent ones and therefore hard for me to compare? Would it be possible for you to tweak it so that it only compares lets say the most recent 30 or 50 comments?

lordnacho · on June 20, 2021

Isn't dang a bit of a special case? How many people comment about the rules of the site?

busymom0 · on June 20, 2021

It is. I just picked that because it was an already provided option and I recognized that name. I few pages of comments I compared the accounts of didn't see much similar but OP also replied that it compares 3 years worth of comments so it's hard for me to notice it.

wickedsight · on June 20, 2021

Hello, doppelgänger!

lordnacho · on June 20, 2021

Heh that's an interesting one. I turn up as your doppelgänger but you don't seem to turn up as mine.

gk1 · on June 20, 2021

It currently compares past 3 years of comments, so yeah the similarity may not be apparent from just the recent comments.

We did this so it would work even for the less active users. But I think your suggestion might work, too.

busymom0 · on June 20, 2021

Yea, 3 years is a long period. I think by default you should compare only past 30-60 comments and have options for comparing past year and past 3 year. If that works well, you could actually build a "dating app for intellectually curious" ones. Might be easier to do using Reddit data too.

busymom0 · on June 20, 2021

Looks like the site crashed :( Gives "Please wait..."

jedwhite · on June 20, 2021

Nice use of your service for a Show HN!

The number of comments about the Karma being divergent on people's matches suggest you could add a simple ensemble approach with a karma heuristic on top of the vector similarity search to help results.

If the data used isn't already reflecting Karma, it could be a useful metric for representing a whole bunch of things (quality of comments, participation, length of time), and make the vector similarity more meaningful.

It would be interesting to see if users perceive the results as higher quality if you added a simple Karma similarity filter (maybe 400 points or something based on the standard deviation of the average karma score), and then returned the closest matches filtered by that metric.

Vector similarity search as a service looks like a good market. Out of interest, what would the cost translate to for running something like this in practice using the API as a customer?

gk1 · on June 21, 2021

> Out of interest, what would the cost translate to for running something like this in practice using the API as a customer?

We offer usage-based billing at $0.10 per GB of memory hour. (https://www.pinecone.io/pricing/) For this app, our eng team knows for sure but I think the entire index is less than 1GB so it would be just $73/month if we keep it running 24/7.

Vector similarity search is new for most companies, so we want to make it very easy to try and test stuff out in production, without cost being a barrier. Even for larger volumes (40GB+) we offer volume and pre-commitment discounts.

jedwhite · on June 20, 2021

PS Possibly you could add a floor (say less than 20 to exclude shill or throwaway accounts), or ignoring karma difference over a threshold (say 1000). I think applying simple filters based on common sense or domain knowledge can help vector similarity searches with sparse data or pollution. Just a random thought :)

inglor_cz · on June 20, 2021

The first two people in the list did not look like me much. But the third one (NeedMoreTea) was an interesting hit, commenting in a similar fashion and exploring similar topics, not necessarily from the same perspective. I am now immersed in his comment history.

Also, funnily, I really like tea and I drink ~ half a gallon a day.

k__ · on June 20, 2021

My top 5 didn't post for over a year.

Corona probably got all of them :(

dane-pgp · on June 20, 2021

'There can be only one.'

63 · on June 21, 2021

My matches seemed pretty accurate in terms of general phrasing (I do a lot of 'I grew up in...' and 'I once knew someone who...'), but the subject matter was a bit concerning at times. Does the average account have so many judgmental downvote-heavy comments?

briefcomment · on June 20, 2021

Looks like it's not loading at the moment. Really curious to try this out.

Edit: Nvm, just not loading in Safari

gk1 · on June 20, 2021

Try now, it was updated to handle the load better.

sam_lowry_ · on June 20, 2021

Many years ago, I built something similar for Drupal and its votingapi module. Just checked it, I was using Pearson's correlation coefficient between votes. It worked fast and was surprisinly accurate. You need access to voting history for that, of course.

leephillips · on June 20, 2021

Intriguing idea, but for me the criteria used to root out doppelgängers doesn’t lead to interesting results. My HN soul mates do not write like me, do not write about the same things, and do things I would never do, such as refer to Wikipedia articles.

jcims · on June 20, 2021

Same. I don't really recognize anything in the top four for me.

Edit: I lied, top match 'asark' tends to share in my proclivity for the run-on sentence. Topically we're not really into the same things, so it has to be something along those lines.

WalterGR · on June 20, 2021

Are you sure? You, pseudolus, barry-cotter, and jbegley sure do post a lot of the same URLs...

leephillips · on June 20, 2021

I’ll take another look.

EDIT: those users are not even on the list of my doppelgängers.

_fnhr · on June 20, 2021

I think currently it is a bit biased towards users with low comment count. Top similar user for me had 4 comments, top 2 only 1 comment. Then top 5 and top 6 again had 1 comment each.

Maybe the similarity score can be weighted by the number of comparisons somehow?

gk1 · on June 20, 2021

I wonder if there are simply more users with low comment numbers. But yes we considered only including users above some karma score, and still might do that in the future.

_fnhr · on June 20, 2021

This depends on how the similarity is measured. If it is measured as % of agreement between comment texts then it's more likely to have better agreement with someone who has fewer comments rather than more.

For example I suspect that if we generated 1000 random users with random gibberish comments and varied their comment numbers from 1 to 10 or so, the top similarities would be biased towards low comment-count random users. This would be because having one randomly generated comment match your style is easier than having 2 randomly generated comments match the same style.

And if that's the case then the same issue would transfer to comparing real users.

@busymom0 suggested a great solution in this thread - only do comparisons based on "n" (like last 50) comments. This way every similarity would be measured using the same number of comments and users with low comment counts would be excluded automatically.

DFHippie · on June 20, 2021

I think the algorithm might have trouble with negation, questions, irony, jokes, etc. My top matches seemed inclined to comment on similar topics but with different opinions. Granted, my matches topped out at 0.972, so maybe I'm an outlier.

aasasd · on June 20, 2021

Also this is the kind of service I would build if I wanted to figure out the ip of a user.

lenocinor · on June 21, 2021

Yeah. I wonder why more people here aren’t mentioning that this could be used to help unmask throwaway accounts for people who usually post here with a different account.

meowster · on June 20, 2021

Yep. I'm using a VPN right now, but I only looked up my latest account. I'll look up my old HN accounts later on a different IP/VPN service.

leotaku · on June 20, 2021

This worked relatively well for my account.

A major reason for my first match seems to have been both accounts talking about licensing issues and specifically GPL. Other matches seem to have shared some superficially similar political perspectives on HN.

supernova87a · on June 20, 2021

By the way, on a related question, I have an interest in being able to download all the things I've written on HN, but am not clever enough to hack together some tool to wget and parse my own user history.

Has anyone seen a tool to do this?

besnn00 · on June 20, 2021

Where there is a will, there is a way!

Really you should just start doing it and eventually you will get there; no such thing as not clever enough.

jimmydorry · on June 21, 2021

Interesting that all of mine seemed to be quite well established in terms of account age and karma.

  | Username        | Similarity Score |
  |-----------------+------------------|
  | boomboomsubban  | 0.974            |
  | tgsovlerkhgsel  | 0.973            |
  | fixermark       | 0.972            |
  | shadowgovt      | 0.971            |
  | stanferder      | 0.971            |
  | xvector         | 0.971            |
  | geofft          | 0.971            |
  | drdaeman        | 0.971            |
  | shkkmo          | 0.971            |
  | frombody        | 0.971            |

akmittal · on June 20, 2021

Shows my username as inactive or not valid

MaxBarraclough · on June 20, 2021

Same here. I've tried Firefox and Edge, same result in both.

gk1 · on June 20, 2021

It works for me when I enter your name. Try again?

_Microft · on June 20, 2021

It seemed to me that it is case-sensitive, right? Maybe make it not case-sensitive or if that's not possible at least mention it somewhere, e.g. by putting it into the label of the input box as "Username (case-sensitive)".

mywacaday · on June 20, 2021

I got that as well, worked on third try

Iasluhar · on June 20, 2021

Yup, same.

mikewarot · on June 20, 2021

I just found people who actually agree with me on HN... kinda cool.

Leparamour · on June 20, 2021

>I just found people who actually agree with me on HN

You're still wrong though :D

DyslexicAtheist · on June 20, 2021

get enough accounts and you're scaling being wrong :P

Sebb767 · on June 20, 2021

Apparently I'm rather similar to jacquesm [0], while shantly [1] is the top match of both of us. Associativity seems to hold up. Also, I'm in a way on the front page (?), so I'm happy ;)

[0] https://news.ycombinator.com/threads?id=jacquesm

[1] https://news.ycombinator.com/threads?id=shantly

hobs · on June 20, 2021

I also had an extreme match for shantly - dunno why exactly though they seem like a reasonable poster.

  shantly 0.989
  asark 0.988
  intergalplan 0.988
  jakobegger 0.988
  gh-throw 0.988
  moshmosh 0.988

shawnz · on June 20, 2021

This works quite well for me, I see a list of people who all seem to have agreeable comments. All my similarity scores are in the range 0.988-0.986.

muzani · on June 20, 2021

There are 9 people with over a .99 match. It's the writing style of non-native English speaker who spends a lot of time on meme sites.

Bayart · on June 21, 2021

All my matches seem to be contentious, argumentative people, and I can't say it's uncalled for. I don't think we would get along.

I'm wondering how matching people completely randomly under the guise of of a fake algorithm would fare. Placebo matching if you will. The results in terms of social interactions might be more fruitful.

hidden-spyder · on June 20, 2021

Stuck on "Please wait..." for me. :(

gk1 · on June 20, 2021

Try now. We’ve upped the resources.

scanr · on June 20, 2021

Mine resulted in amazing synchronicity, turns out someone had sent me a link to a blog post of theirs just this morning.

graderjs · on June 20, 2021

Finally we bring authorship attribution to expose secondary HN accounts. I've been waiting for this for years!

phendrenad2 · on June 20, 2021

This is cool. I can definitely see similarities in sentence length and style compared to my closest matches.

motohagiography · on June 20, 2021

Does this mean I need to travel the earth and time to find the others so I can defeat them in single combat, because there can be only one ?

Seriously, if you want Highlanders, this is how you get Highlanders.

That said, after reading my doppegangers comment histories, I'd totally subscribe to their newsletters.

runawaybottle · on June 20, 2021

You are admitting you love echo chambers and bubbles.

dang · on June 20, 2021

Would you please stop posting unsubstantive comments (like this one and https://news.ycombinator.com/item?id=27572603) and specifically not cross into personal attack? We ban accounts that do those things, and we've had to ask you more than once not to.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful. Note this one: "Please don't sneer, including at the rest of the community."

vyrotek · on June 20, 2021

Interesting! One of them is a banned account and another is a throw away.

  Bucephalus355  0.944
  freehunter  0.943
  sn_master  0.943
  protomyth  0.943
  zapttt  0.942
  cmhnn   0.942
  1cvmask  0.942
  sabujp  0.942
  Jkvngt  0.942 (Banned)
  GoinginSircles 0.942 (Throw away)

BiteCode_dev · on June 20, 2021

Handy to find people using several identities, spammers with multiple accounts or to uncover throwaways.

Leparamour · on June 20, 2021

How would you even discern which is which?

I have several abandoned HN accounts because I switch to a new account after a while in order to not leave to much PII for doxxing. I don't care about karma at all.

This tool didn't offer any of them as doppelgangers, although (in theory) I should match my own style 100%.

oefrha · on June 20, 2021

That’s black box AI systems for you. Works in some cases, doesn’t work in others, fails hilariously occasionally; and people will make false accusations based on them.

kowlo · on June 20, 2021

So:

bool AI(int id1, int id2){

    return True;
   
}

ksaj · on June 21, 2021

Okay, the first one I got does actually write like me. It's funny because I didn't agree with what he says in his posts, but I totally recognize why it ranked highly similar to my writing.

Oh gawd, now I'm sounding exactly like that guy... (I just need to add some parens.)

okareaman · on June 20, 2021

I have bipolar disorder and can be "all over the map" as my ex used to say. I was pleased that my twins were similar, so maybe the model does a better job matching with varied and distinctive data - more signal to work with.

hn8788 · on June 20, 2021

Kind of weird looking at mine. I've got an old account that I don't use any more. It didn't show up in the list of similar accounts, and the list for this account and my old account were completely different.

gorgoiler · on June 20, 2021

Can you share the tool and its database as well as hosting it? Is it small enough to do so?

I’d love to see / use your work but it feels weird to participate in allowing a third party to build up an (hn-username, ip-address) database.

eob · on June 20, 2021

Would love to chat ([email protected]) if you’re interested in using APIs like this in production. NLUDB is supplying folks with equivalent APIs both as a SaaS and a private cloud install.

If you’re interested in rolling your own, a good place to start is the sentence-transformers Python package along with a KNN search service like Spotify’s Annoy.

Pinecone, of course, looks awesome as well :-)

gk1 · on June 20, 2021

We’ll write a longer how-it-works post soon, so you’ll be able to make your own version.

baby · on June 20, 2021

“ This user does not exist or does not have any activity.”

:(

Edit: looks like it’s case sensitive, be warned people on mobile phones.

Edit2: the person I matched with the most seems a bit aggressive in their comments. I guess I can learn something from this.

pelagic_sky · on June 21, 2021

As a designer on HN. The top 5 matches didn’t resonate as they were obviously developers who nerded out about code. My most relatable characteristic was some of our comments are heavily downvoted. :)

runawaybottle · on June 20, 2021

I need the following users to go away, there isn’t room for all of us:

journalctl 0.994

thisisweirdok 0.994

veryworried 0.993

shantly 0.993

core-questions 0.993

not_a_cop75 0.993

lambda_obrien 0.993

LeoTinnitus 0.993

xwolfi 0.993

magashna 0.993

—————————-

Edit: On a more serious note, how can we use this to find echo chambers and homogenized news sources? Keep rolling with this idea, I think it’s important.

achow · on June 20, 2021

Another signal for matching could be user's 'Favorite'.

egypturnash · on June 20, 2021

My #2 match seems to not exist. https://news.ycombinator.com/threads?id=danaliv

axegon_ · on June 20, 2021

Seems it picked up on some phrases I use which are not too common among the masses. Beyond that there's little to no relation between me and my doppelgangers.

jedberg · on June 20, 2021

I broke it. It says I don’t exist or have any activity. :)

drdec · on June 20, 2021

> It says I don’t exist

We were hoping you wouldn't have to find out this way.

jedberg · on June 20, 2021

Just what I needed on Sunday morning -- an existential crisis!

0_gravitas · on June 20, 2021

All my doppelgangers seem much nicer than I am :^)

Also one was a throwaway of a guy talking about his time on shrooms /shrug

Edit: ah, one is most certainly not very nice ¯\_(ツ)_/¯

gnicholas · on June 20, 2021

Most of the listed accounts for me have no activity in recent months, and the ones that do engage on different topics (they are quite technical; I am not).

johannes1234321 · on June 20, 2021

Hello my Doppelgänger, you are on place two for me :-D

gnicholas · on June 21, 2021

Interesting. You seem like a closer match than the ones on my list. Looks like we're in different countries (US here), but we both comment on legal/tax matters. Are you a lawyer, by chance? I'm a former tax lawyer.

johannes1234321 · on June 23, 2021

Hi, nö I'm not in the legal space but software developer.

But the rule based system of law combined with the complexities of human society is something I consider interesting.

jasonvorhe · on June 20, 2021

> Error: Received no response from server Code: 1ST

Not working for me.

gk1 · on June 20, 2021

Try again? The app is hosted by Streamlit which is still in beta.

pbhjpbhj · on June 21, 2021

So, we give this our username and it matches us to browser id/cookies/IP/whatever and sells the set to marketeers ... or?

rcv · on June 21, 2021

From the "How it Works" section:

> We took usernames and respective comment histories from the past three years using the Hacker News API. Then we transformed them into vector embeddings using a pre-trained model, and loaded them into Pinecone.

ozzmotik · on June 20, 2021

definitely a neat way to find people that you may not have ever encountered before, as I certainly don't recognize anyone on my list:

hardwaresofton 0.97 g82918 0.969 karmakaze 0.969 pushpop 0.969 abqexpert 0.968 breischl 0.968 forgotmypw17 0.968 dan-robertson 0.968 anaerobicover 0.968 tgbugs 0.968

definitely some good stuff to spend some time looking into though. thanks for sharing

jl2718 · on June 20, 2021

These matches suffer from a very common problem in recommendation engines, which is a tendency toward matching with the long tail of randomness.

The problem is that your matching engine assumes perfect accuracy in vectorization, but actually, the vectorization is a sample from a distribution.

The distribution of the vectorization comes from the idea that it is trained on a mix of information and randomness, and this model is just one instance of the randomness. There are some sources of randomness in the model that you control, and others in the data.

To get better recommendations, you first need to get the distributions for each element of the vectorization. The simplest approximation would be a variance around the model output. Each item should have a variance determined by the amount of information available for that sample.

Now you have another problem, which is finding the closest vector. A p-test will fail you because that is telling you the probability that you came from the same distribution, which is going to be the random distribution for almost all pairs. You might ask something like, the probability that the two points come from a distribution that is distinct from the random distribution. You’d have to form that distribution, asses probability of membership, and then probability of rejection from randomness. You could also consider doing this for each vector element and returning a negative sum of log probability to represent the amount of information shared between vectors.

But ultimately, you need a truth set to test these methods against. This is easy. You just split each person into N people by randomly assigning their comments to one of N identities. This could also be used as an element of the variance.

An easy way to start is by bootstrapping the randomness. In this case, there is probably a random initialization vector in the model. You can bootstrap the distribution of each vector element w.r.t. the IV by running the model many times with random initialization vectors. You are still using a fixed point for the training data noise, but this is a start. To bootstrap the training noise variance, you can train many models using random selection of the data, and the same IV. A good heuristic split is to decide how many times you will run the model, x, and then do random selection by 1/sqrt(x). Ex. Run 100 times with 1/10th of the data in each run. Then you have two distributions for each vector element, one for the data randomness, and one for the model randomness, but the mean from the model randomness is the most informative. Now add the data variance / sqrt(x) to get a rough approximation of the true variance.

These are all hacky ways to get a decent improvement, and the formal methods will also give great improvement on top of that, but are easy to mess up and often quite expensive to bootstrap.

sdwr · on June 20, 2021

From what i can tell, you're saying that when the model runs, it compresses the data into a crystallized, idiosyncratic set of weights, and running the model a bunch of times + averaging them will smooth out the results.

Is that necessarily better? My matches felt like different people, and i like that crunchy recommendation butter more than the smooth version (unlike in real life).

meowster · on June 20, 2021

Now, can we turn it into a matchmaking service?

muzani · on June 21, 2021

I'll probably match with a 51 year old Austrian woman who acts like a 22 year old, or my sister.

whydoineedthis · on June 20, 2021

I don't exist, according to the site

EamonnMR · on June 20, 2021

Same here. Maybe they only index people with a certain threshold number of comments or amount of karma.

pimlottc · on June 20, 2021

Yeah, that’s kinda a downer.

CrazyStat · on June 20, 2021

Same.

gk1 · on June 20, 2021

It’s case sensitive, try again?

CrazyStat · on June 20, 2021

I tried with proper case and it still didn't work.

Then I just mashed the button a few times and it said username found but error getting history, so I mashed the button a few more times and it eventually worked. Do probably just backend overloaded.

I skimmed through my top matches and one of them appears to have attended the same undergrad as me, so that was interesting.

raarts · on June 20, 2021

I feel including only my last 3 years of activity would not characterize me sufficiently. Why not all of it?

pc86 · on June 20, 2021

I looked through the first half dozen and it seems pretty on point (similarity scores are high .98x's).

formerly_proven · on June 20, 2021

Is it intentional that this only works once and then the page needs to be reloaded to enter another name?

gk1 · on June 20, 2021

No, it might be resource constraints, trying to handle the traffic.

Leparamour · on June 20, 2021

Nice idea.

Do you retain any user data? Since most users are probably looking up their own username, it would be a simple task to match and log HN usernames and their respective IPs.

mattst88 · on June 21, 2021

First two with the highest similarity score to me were banned. Hope that's not predictive!

gk1 · on June 20, 2021

Hey all, if you see an error please try again! Might just be HN hug of (temporary) death.

Aeolun · on June 21, 2021

My doppleganger has stopped posting somewhere in 2020.

Now I wonder what happened to an internet stranger.

11thEarlOfMar · on June 20, 2021

It's bi-directional for me. I am my #1 match's #1 match.

Should be that way in general, right?

mind-blight · on June 20, 2021

It probably depends on the accounts, and the algorithm. If the comparison algorithm uses some sort of distance calculation between 2 users to figure out how close they are, then you could have single directional relationships.

If my comments are on an island of weirdness, you may be the closest person to me while still being really far away. If your comments are relatively normal, you might have a lot of people around who are closer than me. That would make you my doppelganger, but not make me yours.

Edit: I just checked my doppelganger (barry-cotter), and I'm not even in his list :p. I've seen that username appear in a couple other comments under this post. I wonder if there are a few super normal users that a lot of people are closest to.

joshuaissac · on June 20, 2021

My matches have each other within their top matches, but none of them have me in their top ten.

I have a lower similarity score with my top match, compared to my matches' similarity scores with their tenth closest matches. So it is that island scenario that you described.

krick · on June 20, 2021

No. There's no reason for it to be symmetric, there can be unlimited number of your closest neighbour's closest neighbours, that are closer to him than you are. I mean, it's literally measuring distance between dots in the n-dimensional space, and if you'll ask your question about the dots on the paper, the answer will be obvious to you.

FWIW, I'm not even on my #1's list.

Aeronwen · on June 20, 2021

The person you're a closest match to, could have a closer match to someone else. Same with their closest match, and so on.

dtnewman · on June 20, 2021

Says username doesn’t exist

gk1 · on June 20, 2021

I see results for your username. Try again? I can’t share direct link.

audiometry · on June 20, 2021

It couldn’t find my account, huh? Oh it’s case sensitive. I’d change that.