Multiplication is also very fast, usually one or two cycles on larger chips.
I wondered if that's really still true, since I haven't done much assembly language programming since PowerPC was new.
Here, it says the M1 has 7-9 cycles latency for division instructions, but throughput of 2 cycles per.
https://dougallj.github.io/applecpu/firestorm-int.html
"The M1 is 10x faster than the Xeon at 64 bit divides. It’s…just wow."
So, given all of the other things that can slow you up, I wonder if it really makes sense to avoid division any more?
(I guess the energy efficient "Icestorm" cores have throughput equal to latency, so it's only the "Firestorm" ones where it's super fast)
Multiplication is also very fast, usually one or two cycles on larger chips.