You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sage attention works very well for most GPUs. However I have tried it recently on a RTX 5090 and occasionally one v token contains only nan values (in my case 512 nan) after being fp8 quantized . This nan then propagates to the rest of the attention that becomes completely nan.
I have tracked the issue to the call * _fused.mean_scale_fuse_quant_cuda* (same for _fused.scale_fuse_quant_cuda) in the per_channel_fp8 function. I dont know if this related but the v token that was entirely turned into a nan contained 512 identical values (here: 0.0010) and it was not the case for v tokens around (which had 511 identical values and one different value, as these are context null tokens of a cfg).
I am sure that is a sage bug as I dont have this nan problem if I replace it with sdpa. I have have this problem on Windows, I dont know if Linux is also concerned.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Hello
Sage attention works very well for most GPUs. However I have tried it recently on a RTX 5090 and occasionally one v token contains only nan values (in my case 512 nan) after being fp8 quantized . This nan then propagates to the rest of the attention that becomes completely nan.
I have tracked the issue to the call * _fused.mean_scale_fuse_quant_cuda* (same for _fused.scale_fuse_quant_cuda) in the per_channel_fp8 function. I dont know if this related but the v token that was entirely turned into a nan contained 512 identical values (here: 0.0010) and it was not the case for v tokens around (which had 511 identical values and one different value, as these are context null tokens of a cfg).
I am sure that is a sage bug as I dont have this nan problem if I replace it with sdpa. I have have this problem on Windows, I dont know if Linux is also concerned.
The text was updated successfully, but these errors were encountered: