Added cudnn_frontend api in caffe to support CUDA11+cuDNN8 #2184

cheneeheng · 2023-03-23T16:23:20Z

I tested this setup with CUDA11.7 + cuDNN8.5 on a GTX1660TI. It runs openpose for human pose extraction normally without the huge GPU memory usage issue. The GPU memory usage is the same as the CUDA10.2+cuDNN7 setup, while the inference speed is about ~1fps faster.

Hope this helps someone who needs to use CUDA11 very badly.

Changelog:

added cudnn-frontend submodule.
updated cmake with new flag and new 3rdparty repository cudnn_frontend .
changed caffe submodule repo target.
-- added DUSE_CUDNN_FRONTEND option. Uses the frontend api instead of the current algorithm wrapper cudnnGetConvolutionForwardAlgorithm_v7 for cuDNN8.
-- added cudnn_v8_utils.hpp + cudnn_v8_utils.cpp files for cudnn_frontend api. It currently only supports forwardpass.
-- fixed warnings.
-- reduced GPU memory usage by setting CUDNN_STREAMS_PER_GROUP=1
-- added compute capability check in tensor creation to enable tensor core usage in ampere cards.

- added cudnn-frontend submodule - updated cmake - changed caffe submodule repo target

Added support for using cudnn_frontend api in caffe

d5b9667

- added cudnn-frontend submodule - updated cmake - changed caffe submodule repo target

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added cudnn_frontend api in caffe to support CUDA11+cuDNN8 #2184

Added cudnn_frontend api in caffe to support CUDA11+cuDNN8 #2184

cheneeheng commented Mar 23, 2023

Added cudnn_frontend api in caffe to support CUDA11+cuDNN8 #2184

Are you sure you want to change the base?

Added cudnn_frontend api in caffe to support CUDA11+cuDNN8 #2184

Conversation

cheneeheng commented Mar 23, 2023