Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][CUDA] performance issue with a SYCL program #16696

Open
jinz2014 opened this issue Jan 20, 2025 · 0 comments
Open

[SYCL][CUDA] performance issue with a SYCL program #16696

jinz2014 opened this issue Jan 20, 2025 · 0 comments

Comments

@jinz2014
Copy link
Contributor

Hello
There seems a performance gap between the CUDA and SYCL programs on an NVIDIA A100 GPU.
I tried Syclomatic, but the translation was not successful.

https://github.com/zjin-lcf/HeCBench/tree/master/src/scatter-cuda

CUDA (12.5)

./main 10000000 100
INT32 scatter (mul, div, sum, min, max)
Average execution time of kernel: 609.347046 (us)
Average execution time of kernel: 513.615234 (us)
Average execution time of kernel: 224.066589 (us)
Average execution time of kernel: 224.341263 (us)
Average execution time of kernel: 224.259125 (us)

https://github.com/zjin-lcf/HeCBench/tree/master/src/scatter-sycl

SYCL (icpx 2025.0.0)
./main 10000000 100
INT32 scatter (mul, div, sum, min, max)
Average execution time of kernel: 5594.654785 (us)
Average execution time of kernel: 5526.372559 (us)
Average execution time of kernel: 5501.559570 (us)
Average execution time of kernel: 5502.131348 (us)
Average execution time of kernel: 5501.163086 (us)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant