(I'm the author) You're right, I should have specified -- it is glibc 2.32-48 . ...

floxy · on Aug 17, 2021

Thanks for the interesting writeup. I wonder if it is because glibc has a less-optimized atan2f (for floats). The double version seems quite involved in glibc anyway:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/iee...

...where the atan2f version ends up calling atan, which doesn't seem as sophisticated:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/iee...

stephencanon · on Aug 17, 2021

Thanks for clarifying!

Without benchmarking, I would expect atan2f to be around 20-30 cycles per element or less with either Intel's or Apple's scalar math library, and proportionally faster for their vector libs.

rostayob · on Aug 17, 2021

Thanks for the info.

By the way, your writing on floating point arithmetic is very informative -- I even cite a message of yours on FMA in the post itself!