It's my understanding of how embeddings work, as well. The dot products (cosine similarity) of man . woman end up being very similar to king . queen. This similarity could be considered subtraction if you stretch the metaphor enough.
If this is not a valid way to think about embedding vectors, would you care to elaborate?
Embedding vectors are when you take a high-dimensionality vector of word ID's and reduce deminsionality to something more manageable. (Think something like principal component analysis or singular value decomposition.)
The "similarity" here means word ID's that commonly occur together in vectors.
There is no logical reasoning or attempts at semantic analysis here.
> The dot products (cosine similarity) of man . woman end up being very similar to king . queen
This is because 'king' and 'man' occur together in a distribution similar to that of 'queen' and 'woman'.
The idea that the embedding of 'king' is somehow a sum of 'autarch' and 'man' and that subtracting 'man' from 'king' and adding 'woman' somehow gives you 'queen' is an urban legend. Embeddings don't carry semantic meanings, they aren't dictionaries or encyclopedias. They are only statistical features about word co-occurrences.
This blog post [0] thinks that it is indeed possible to do arithmetic operations on them. My intuition is that they're vectors after all, and can be added and subtracted like any other vector. A word is just a location in that high-dimensional vector space.
EDIT: I guess there are different forms of word embeddings and apparently modern LLMs don't use static word embeddings like word2vec and it's more contextual. Tokens aren't 1:1 with words either of course. I guess it's more complex than "LLMs represent words as vectors". Still though it's a neat trick and is indeed that simple with something like word2vec.
The idea that the embedding of 'king' is somehow a sum of 'autarch' and 'man' and that subtracting 'man' from 'king' and adding 'woman' somehow gives you 'queen' is an urban legend.
And I will, that passage is intentionally written to mislead in a self-serving way. If you know the math behind it you understand that it's technically correct, but a layman or a cursory reading will leave you with a wildly incorrect idea.
It's above my pay grade for sure, but you can get in touch with him at either Microsoft Research, the Royal Academy of Engineering, or the Royal Society, where he holds fellowships. Might want to cc: Yoshua Benjio, who seems to be laboring under similar misapprehensions.
If this is not a valid way to think about embedding vectors, would you care to elaborate?