Hacker Newsnew | past | comments | ask | show | jobs | submit | jvz01's commentslogin

Another good hint is the classical half-angle formulas. You can often avoid calling sin() and cos() altogether!


I have developed very fast, accurate, and vectorizable atan() and atan2() implementations, leveraging AVX/SSE capabilities. You can find them here [warning: self-signed SSL-Cert].

https://fox-toolkit.org/wordpress/?p=219


Little side-note: algorithm as given is scalar; however, its branch-free, and defined entirely in the header file. So, compilers will typically be able to vectorize it, and thus achieve speed up directly based on the vector size. I see potential [but architecture-dependent] optimization using Estrin scheme for evaluating the polynomial.


Your result is significantly slower than the versions presented in the article, though yours has more terms and so may be more accurate.


Yes, aim was to be acurate down to 1 lsb while significantly faster. Feel free to drop terms from the polynomial if you can live with less accurate results!

The coefficients were generated by a package called Sollya, I've used it a few times to develop accurate chebyshev approximations for functions.

Abramowitz & Stegun is another good reference.


Please, Would you mind one of these days updating your blog post with the instructions you gave to sollya? I'm trying something stupid with log1p and can't get sollya to help, mostly because I'm not putting enough time to read all the docs...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: