BUG: RTX 50XX nan returned by _fused.mean_scale_fuse_quant_cuda and _fused.scale_fuse_quant_cuda #164

deepbeepmeep · 2025-04-30T13:38:22Z

Hello

Sage attention works very well for most GPUs. However I have tried it recently on a RTX 5090 and occasionally one v token contains only nan values (in my case 512 nan) after being fp8 quantized . This nan then propagates to the rest of the attention that becomes completely nan.

I have tracked the issue to the call * _fused.mean_scale_fuse_quant_cuda* (same for _fused.scale_fuse_quant_cuda) in the per_channel_fp8 function. I dont know if this related but the v token that was entirely turned into a nan contained 512 identical values (here: 0.0010) and it was not the case for v tokens around (which had 511 identical values and one different value, as these are context null tokens of a cfg).

I am sure that is a sage bug as I dont have this nan problem if I replace it with sdpa. I have have this problem on Windows, I dont know if Linux is also concerned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: RTX 50XX nan returned by _fused.mean_scale_fuse_quant_cuda and _fused.scale_fuse_quant_cuda #164

BUG: RTX 50XX nan returned by _fused.mean_scale_fuse_quant_cuda and _fused.scale_fuse_quant_cuda #164

deepbeepmeep commented Apr 30, 2025 •

edited

Loading

BUG: RTX 50XX nan returned by _fused.mean_scale_fuse_quant_cuda and _fused.scale_fuse_quant_cuda #164

BUG: RTX 50XX nan returned by _fused.mean_scale_fuse_quant_cuda and _fused.scale_fuse_quant_cuda #164

Comments

deepbeepmeep commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

deepbeepmeep commented Apr 30, 2025 •

edited

Loading