Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In his book "An Introduction to Database Systems", DB2 Architect CJ Date describes an index design that compresses indexes of similar strings. So if entries in the index block start with the same first few letters, the first entry has the complete string, and subsequent entries use the format: <count-of-matching-characters> <remaining-characters-that-are-different-than-previous>. The algorithm burns some CPU to conserve I/O.

For example:

0 let 3 ter 6 s 4 uce

For the words: let, letter, letters, lettuce.

I can't speak to whether Postgres uses that index format; I haven't checked. Just saying an index as described above is a textbook idea that's had already been around for a long time when I learned it some 30 years ago.

I also want to share, Date's book is the best book I've ever read on information processing theory. It brought several foundational ideas into my skill set.



Sure, there are lots of indices that format things in different ways (sounds like delta encoding). Just not Postgres btrees.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: