You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When applying the Exp function to NaN on my CPU (Intel i7-1185G7 using the AVX3_DL target) the function returns 0.0 instead of the NaN that I expected (and that std::exp returns).
Is this a bug or performance-related feature of the highway Exp function?
Hi @nielskm , we were not focusing on NaN propagation in the design of the math library. This typically comes at a (minor but nonzero) cost.
The relevant line is return IfThenElseZero(Ge(x, kLowerBound), y);. Comparison of x=NaN returns false, so we get zero from IfThenElseZero. I suppose this could be turned into something like: IfThenZeroElse(Lt(x, kLowerBound), IfThenElse(Ge(x, kLowerBound), y, x). The first comparison is to flush tiny inputs to zero, the second keeps the result or replaces with x=NaN.
Are NaNs essential to your app?
The vscalefpd instruction should handle NaNs correctly given AVX3, so correct NaN handling should be free on those targets. It should also make the current implementation of LoadExpShortRange cheaper for those targets
EDIT: I'm not sure exactly how it's handled / I'd have to look a lot closer, but can confirm that my implementation in Julia that uses vscalefpd returns NaN/0.0/Inf as appropriate without needing any checks.
Thank you both for your detailed replies! NaN propagation is not essential in my application, but I just happened to test highway Exp against std::exp and the exponential function implemented in Eigen3. Both of the others propagate NaNs, which made me wonder if this was a deliberate design choice in highway.
@chriselrod good point about using vscale for LoadExpShortRange. We could add an AddExponent op or similar to take advantage of that, if anyone is interested.
There's also vfixupimmpd for fixups when no scaling is required, but I'm not sure it's worth the cost on other platforms.
It was indeed a deliberate design choice but that can be changed if/when someone really cares about NaN propagation :)
Activity
jan-wassenberg commentedon Oct 17, 2022
Hi @nielskm , we were not focusing on NaN propagation in the design of the math library. This typically comes at a (minor but nonzero) cost.
The relevant line is
return IfThenElseZero(Ge(x, kLowerBound), y);
. Comparison of x=NaN returns false, so we get zero fromIfThenElseZero
. I suppose this could be turned into something like:IfThenZeroElse(Lt(x, kLowerBound), IfThenElse(Ge(x, kLowerBound), y, x)
. The first comparison is to flush tiny inputs to zero, the second keeps the result or replaces with x=NaN.Are NaNs essential to your app?
chriselrod commentedon Oct 17, 2022
highway/hwy/contrib/math/math-inl.h
Lines 1124 to 1125 in 6bc3405
The
vscalefpd
instruction should handle NaNs correctly given AVX3, so correct NaN handling should be free on those targets. It should also make the current implementation ofLoadExpShortRange
cheaper for those targetshighway/hwy/contrib/math/math-inl.h
Lines 829 to 832 in 6bc3405
https://www.felixcloutier.com/x86/VSCALEFPD.html
https://uops.info/html-instr/VSCALEFPD_ZMM_ZMM_ZMM.html
EDIT: I'm not sure exactly how it's handled / I'd have to look a lot closer, but can confirm that my implementation in Julia that uses
vscalefpd
returnsNaN
/0.0
/Inf
as appropriate without needing any checks.nielskm commentedon Oct 18, 2022
Thank you both for your detailed replies! NaN propagation is not essential in my application, but I just happened to test highway Exp against std::exp and the exponential function implemented in Eigen3. Both of the others propagate NaNs, which made me wonder if this was a deliberate design choice in highway.
jan-wassenberg commentedon Oct 18, 2022
@chriselrod good point about using vscale for LoadExpShortRange. We could add an AddExponent op or similar to take advantage of that, if anyone is interested.
There's also vfixupimmpd for fixups when no scaling is required, but I'm not sure it's worth the cost on other platforms.
It was indeed a deliberate design choice but that can be changed if/when someone really cares about NaN propagation :)