You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. I'm using deberta-v3-base for a text classification task. After training I'm converting a pytorch model to ONNX format. Everything works like a charm except that the size of the model is twice the size of the original DeBERTa - ~750MB. Because of it I want to convert it with mixed precision, i.e. fp16. I tried two approaches:
But in both cases I get this error during inference on CPU:
2023-01-06 10:46:46.332352649 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence can't constant fold LayerNormalization node 'LayerNorm_1'2023-01-06 10:46:46.414666254 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence can't constant fold LayerNormalization node 'LayerNorm_1'
2023-01-06 10:46:46.425605272 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence can't constant fold LayerNormalization node 'LayerNorm_1'
I also tried to set use_gpu=True in optimize_model method. Errors disappeared, but the inference time was 3-4 time slower.
The text was updated successfully, but these errors were encountered:
Hello. I'm using
deberta-v3-base
for a text classification task. After training I'm converting a pytorch model to ONNX format. Everything works like a charm except that the size of the model is twice the size of the original DeBERTa - ~750MB. Because of it I want to convert it with mixed precision, i.e. fp16. I tried two approaches:model.half()
before ONNX conversionBut in both cases I get this error during inference on CPU:
I also tried to set
use_gpu=True
in optimize_model method. Errors disappeared, but the inference time was 3-4 time slower.The text was updated successfully, but these errors were encountered: