> Per-connection processes does make some sense to me, but it seems wasteful whe...

> Per-connection processes does make some sense to me, but it seems wasteful when most connections to a DB are idle most of the time.

Per-process vs per-thread overhead isn't that different in e.g. linux. Some things are more expensive with multiple processes (more page tables/more wasted space/increased process switch cost), others are cheaper (e.g. memory allocation, although that's getting better over the last few years).

> Having to coordinate locks cross-process also seems wasteful; more syscalls and context switches than should be necessary.

I don't think there's a meaningful difference here. We use atomic operations for the non-sleeping lock paths (which'd not be different in threads) and for sleeping locks when we need to sleep, we use semaphores for directed wakeups - but you'd need something similar for threads as well.

Really, the majority of the cost of threading is when you explicitly want to share more state, after processes have initialized. It's e.g. a lot harder to dynamically scale the size of the buffer pool up/down. It's also one of the things that made intra-query parallelism harder.