Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The fundamental problem, especially wrt to 1 and 2, is that you're starting to deal more with the design and implementation of the hash table than anything else. There are two things that need to be considered for a good hash table implementation:

1. Collisions should be expected, and handled correctly.

2. The hashing algorithm should minimize collisions as much as possible.

Yes, the worst case running time is linear. However, assuming a properly defined hashing function, it's practically a linear running time over the expected number of collisions per bucket, which should be a tiny fraction of the total expected data size.

The problem that the second gives us is that proper hash algorithms are tailored to the specific problem at hand. MD5 is overkill when your entire pool of data is 1000 names. And knowing your data set makes it so you can avoid having a relatively large percentage of your data in one bucket.



Minor correction:

The worst case runtime is not O(n), it is defined by how you store your nodes in a hash bucket. If you choose to use a linked list, then searching a linked list is O(n). Also, obviously, there is no free lunch, so using a hash table in each bucket does not magically produce O(1) lookups.

Pick a data structure that has O(log n) runtime for lookups and use it when the hash table breaks down - that is, use it when searching in a bucket.

On the other hand, if your hash table has enough nodes in a bucket that you even notice the difference between O(very small n) and O(log very small n), you really should migrate to a larger hash table.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: