The annoying thing about locks (at least the variant that waits) is not just that you have to enter the kernel and wait when the lock is not available (fair enough), but also that the current holder will have to wake you, which requires another dip into the kernel by the holder.
I have been thinking on and off on how to create a syscall-less wake operation. One way to get almost what you want is to have a polling io_uring. That still requires one kernel thread that busy polls per application. Maybe this is fine in some application architectures but it's not ideal.
It would be nice if there was a way to use Intel's debug registers to write a value to some address, which would then interrupt some kernel task, allowing that kernel task to somehow figure out what futex to wake, without the interrupter having to enter the kernel.
The point of locks 'waiting' is really just that they degrade nicely under heavy contention, e.g. when more threads are trying to take the lock than you have available cores/harts. Busy polling will lead to terrible performance in such conditions, whereas threads that "wait" will do the right thing and leave CPU resources free for the active tasks to progress.
I mentioned busy polling as a means to an end, with the end being the ability to wake a thread without requiring a system call (ideally without busy polling!).
> It would be nice if there was a way to use Intel's debug registers to write a value to some address, which would then interrupt some kernel task
What you have described is literally the syscall mechanism. That's what it is. You perform some register write (via a specific instruction) and an interrupt is taken to the kernel.
Maybe you believe that an asynchronous interrupt would cost less than a synchronous interrupt for this particular objective but I'm not sure there's evidence for that claim.
An asynchronous interrupt would be more expensive, but if you can send it to another core, you do not need to pay the cost on this core, in particular you do not need to enter the kernel. This is particularly useful for remote wakeups when you want to schedule a thread on an another core.
As I mentioned elsewhere, intel was planning to add user-mode interrupts specifically for this sort of scenarios.
>It would be nice if there was a way to use Intel's debug registers to write a value to some address, which would then interrupt some kernel task
Apparently Intel cpus were supposed to get user space interrupts which would do exactly this. I'm not sure of hardware was ever shipped with support though.
I have been thinking on and off on how to create a syscall-less wake operation. One way to get almost what you want is to have a polling io_uring. That still requires one kernel thread that busy polls per application. Maybe this is fine in some application architectures but it's not ideal.
It would be nice if there was a way to use Intel's debug registers to write a value to some address, which would then interrupt some kernel task, allowing that kernel task to somehow figure out what futex to wake, without the interrupter having to enter the kernel.