Hacker News new | past | comments | ask | show | jobs | submit login

I wouldn't call this a performance bug in clang. It's an optimization working as intended.



I would challenge you to find a processor on which the rsqrt plus two newton-raphson iterations is not slower than plain sqrt. (We don't know what mtune the author used)


According to Intel, any processor before Skylake (section 15.12 from [1]).

[1]: https://cdrdv2.intel.com/v1/dl/getContent/814198?fileName=24...


The author probably didn't use any mtune setting, which is likely the problem. If you look at older cores on Agner's instruction tables, SQRT has been getting steadily faster over time. This implementation is slightly faster on old Intel machines, for example.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: