It's worth noting that this comparison is flawed: Judy (iirc) is designed for 64...

tptacek · on June 10, 2010

The fact that Judy has to be optimized (and extensively modified) for specific architectures is a big part of this article's critique.

AlisdairO · on June 10, 2010

Absolutely, and I certainly acknowledge that this is a fair criticism. I guess my beef with it is more that the tone of the article is "Judy has to be extensively modified depending on the machine it runs on, and it's not even that much faster" - whereas, given that he's testing it on a machine for which Judy was not optimised, one could quite reasonably take the position of "While you have to optimise Judy for the machine it's running on to get the best out of it, even if you don't it's substantially faster than a hash table. The downside is code complexity".

Personally, while the 32-byte cache line size is mentioned as an aside in a couple of places, I'd prefer to see it acknowledged in a bit more of an upfront manner. He doesn't exactly go out of his way to say that Judy would be a lot faster on a different machine.

dfox · on June 10, 2010

But Judy as it stays is extensively optimized for relatively common architectures (and performance degradation on 32B cache line systems is not too great). In fact most software optimized for caching behavior that I know of is optimized for 64B cache lines.

Also even simple hash table greatly benefits from even trivial cache-related optimization, so I would say that while Judy is extensively optimized for particularly common expectation, simple hash tables have to be optimized for every platform also.

On the other hand, for most use-cases these optimizations does not matter much, but there are special cases. Python's dictionary is great example (by thew way it is if I remember correctly also optimized for 64B cache lines)