jvz01's comments

jvz01 · on Aug 17, 2021

Another good hint is the classical half-angle formulas. You can often avoid calling sin() and cos() altogether!

jvz01 · on Aug 17, 2021

I have developed very fast, accurate, and vectorizable atan() and atan2() implementations, leveraging AVX/SSE capabilities. You can find them here [warning: self-signed SSL-Cert].

https://fox-toolkit.org/wordpress/?p=219

jvz01 · on Aug 17, 2021

Little side-note: algorithm as given is scalar; however, its branch-free, and defined entirely in the header file. So, compilers will typically be able to vectorize it, and thus achieve speed up directly based on the vector size. I see potential [but architecture-dependent] optimization using Estrin scheme for evaluating the polynomial.

ghusbands · on Aug 17, 2021

Your result is significantly slower than the versions presented in the article, though yours has more terms and so may be more accurate.

jvz01 · on Aug 17, 2021

Yes, aim was to be acurate down to 1 lsb while significantly faster. Feel free to drop terms from the polynomial if you can live with less accurate results!

The coefficients were generated by a package called Sollya, I've used it a few times to develop accurate chebyshev approximations for functions.

Abramowitz & Stegun is another good reference.

touisteur · on Aug 17, 2021

Please, Would you mind one of these days updating your blog post with the instructions you gave to sollya? I'm trying something stupid with log1p and can't get sollya to help, mostly because I'm not putting enough time to read all the docs...