Note that succinct data structures may not be faster than conventional structure...

hinkley · on March 6, 2025

As an application grows the features start to interact. We tend to not be paying much attention to how the old code 'fits into memory' while we are writing new code that also has to fit into memory.

Once you have an entire system written the benefits of having four interacting features that each fit into a third of memory may be bigger than you think. And I don't know how well Intel's hardware level profiling information reveals that but I know for sure that the profiling tools that ship with most commercially viable programming languages never do. They blame issues on whoever touched the system last, and if that is an epicycle or a metastable situation then the apparent guilty party may be a frame-up, while the real culprit goes unpunished.

As an example: if your workflow has a step that eventually needs to use 40% of available memory to solve a problem, then GC will almost always trigger within that step of the process, rather than in the other 60% of memory being systematically wasted by heaps of inefficient code in the leadup to this step, because the top of the memory sawtooth will almost always occur within that part of the call graph. But because the 40% is your intrinsic complexity, people's brains shut off the moment a cost is attributed to unavoidable work, instead of the avoidable work that really caused the majority of the problem.

yvdriess · on March 6, 2025

True, but it depends on what you mean with fitting in memory.

Succinct datastructures are used in genomics (e.g. bwa, megahit exome sequencer) because N is so large that you're actually hitting asymptotic behavior.

For memory latency it can by making your memory footprint fit in LLC or a single node; cross-node NUMA latencies are typically enough to absolutely tank performance.

It can theoretically also help in massively parallel access situations where bandwidth becomes a limiting concern. Although, I intuit we would need near-memory hardware to perform the rank+select. Otherwise the latency of the multiple dependent accesses will kill your performance again, cfr previous point.

With a lot of parallel accesses, bandwidth could also be an issue in conventional structures.

barrkel · on March 7, 2025

There's fits in memory, and there's fits in memory.

I have used various bit-packing schemes in order to keep data in memory. If you can keep your entire dataset in memory, it opens up different ways to tackle it. Succinct data structures look like a way to enable this.

It's not storage access times that kill you. You need to rearrange all your processing to work in batches or windows or use some kind of query API to do anything. Everything becomes much more painful.

thomaskoopman · on March 8, 2025

I think it depends more on the ratio between access time and how often you use the data. Adding two arrays that fit in L1 is already limited by access time. On Zen3, we can add two 32-byte vectors per cycle, but only store one of these per cycle. For matrix multiplication, we can do the two additions per cycle (or really c <- a * b + c) because we have to do multiple operations once we have loaded the data into registers.

I can see it be useful for data sets of a few dozen MBs as well.

fegu · on March 7, 2025

Memory is expensive. In the cloud especially. Using a succinct structure could enable cheaper computing for specific tasks. This benefits everyone.