I am always disappointed when someone talks about std::mutex poorly. On Linux it is as good as it can possibly be for a generic catch all lock and by that I mean it is really really good for most usecases. If you want to use a spinlock to outperform std::mutex you will at least have to do the legwork of using real time scheduling and guaranteeing that any spinlock you are locking will be unlocked within a finite amount of time with a known upper bound. Any less and your spinlock will cause problems when your thread locks it and then gets interrupted by the OS scheduler.
> On Linux it is as good as it can possibly be for a generic catch all lock
A futex is a 32-bit aligned value, thus it needs 4 bytes. But std::mutex on Linux is 40 bytes, ten times larger. Now, maybe where you come from "ten times larger than it needs to be" is "as good as it can possibly be" but where I come from that's not very good.