[GPU] Enable f4_e2m1 jit gemm #2442

kealan-barbieri · 2025-01-17T18:23:15Z

Description

Enable f4_e2m1 in jit::gemm.

Partially covers MFDNN-124711

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

src/gpu/intel/jit/gemm/include/type.hpp

src/gpu/intel/jit/gemm/generator/pieces/remask.cxx

src/gpu/intel/jit/gemm/generator/pieces/layout_setup.cxx

src/gpu/intel/jit/gemm/generator/pieces/gemm_setup.cxx

src/gpu/intel/jit/gemm/generator/pieces/copy_plan.cpp

petercad · 2025-01-17T19:30:36Z

src/gpu/intel/jit/gemm/generator/pieces/copy_plan.cpp

+    // cmp  (ge)           t0:w, y:w, 31
+    // shr                 y:uw, 10
+    // csel (ge)           y:fp16,  0x7bff, y:fp16, t0:fp16
+    // csel (ze)           y:fp16, NaN:fp16, y:fp16, t1:fp16


Side note: there's a much faster sequence, though this is OK for now:

shl t0:ud x:ub 24 add t0:ud t0:ud 1 mov y:hf t0:f

src/gpu/intel/jit/gemm/generator/pieces/copy_plan.cpp

src/gpu/intel/jit/gemm/gen_gemm_kernel.cpp

kealan-barbieri · 2025-01-18T00:27:43Z

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable benchdnn_all
enable benchdnn_matmul
enable benchdnn_ip

src/gpu/intel/jit/gemm/generator/pieces/layout_setup.cxx

src/gpu/intel/compute/kernel_arg_list.hpp

echeresh · 2025-01-28T20:21:20Z

@kealan-barbieri Do we have f4_e2m1 coverage in benchdnn input files? If missing, can you please add some?

In the long term #2434 should help with that.

kealan-barbieri · 2025-01-28T21:55:51Z

@echeresh there is existing coverage: https://github.com/oneapi-src/oneDNN/blob/main/tests/benchdnn/inputs/matmul/test_matmul_fp4

src/gpu/intel/jit/gemm/include/type.hpp

src/gpu/intel/jit/gemm/generator/pieces/gemm_setup.cxx

src/gpu/intel/jit/gemm/generator/pieces/copy_plan.cpp

src/gpu/intel/jit/gemm/generator/pieces/copy_plan.hpp

src/gpu/intel/jit/gemm/selector/db/kernel.db

kealan-barbieri · 2025-02-04T01:42:29Z

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable benchdnn_all
enable benchdnn_matmul
enable benchdnn_ip

kealan-barbieri · 2025-02-04T22:45:46Z

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable benchdnn_all
enable benchdnn_matmul
enable benchdnn_ip

kealan-barbieri · 2025-02-04T22:46:53Z

make test perf-gpu
set primitive=matmul

kealan-barbieri · 2025-02-06T01:48:38Z

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable benchdnn_all
enable benchdnn_matmul
enable benchdnn_ip

src/gpu/intel/jit/gemm/generator/pieces/copy_plan.cpp

Update types with autoTypeConversions before counting outer product ops.

kealan-barbieri added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jan 17, 2025

kealan-barbieri requested review from a team as code owners January 17, 2025 18:23

kealan-barbieri force-pushed the kealanba/f4_e2m1_gemm branch from ebac6cb to 44b218e Compare January 17, 2025 18:24

github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Jan 17, 2025

kealan-barbieri force-pushed the kealanba/f4_e2m1_gemm branch from 44b218e to 021d757 Compare January 17, 2025 18:56