Skip to content

[Bug] 5090显卡无法适配 #3283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
1273603741 opened this issue May 16, 2025 · 12 comments
Open
2 tasks done

[Bug] 5090显卡无法适配 #3283

1273603741 opened this issue May 16, 2025 · 12 comments

Comments

@1273603741
Copy link

Prerequisite

Environment

5090显卡根本配不了环境,怎么都是显示mmcv库缺失,cuda12.8赶紧给适配下5090好用啊

Reproduces the problem - code sample

Traceback (most recent call last):
File "./tools/train.py", line 287, in
main()
File "./tools/train.py", line 276, in main
train_model(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 351, in train_model
train_detector(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 227, in train_detector
model = MMDistributedDataParallel(
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 899262) of binary: /home/super/anaconda3/envs/rcbevdet/bin/python

Reproduces the problem - command or script

适配5090显卡

Reproduces the problem - error message

Traceback (most recent call last):
File "./tools/train.py", line 287, in
main()
File "./tools/train.py", line 276, in main
train_model(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 351, in train_model
train_detector(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 227, in train_detector
model = MMDistributedDataParallel(
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 899262) of binary: /home/super/anaconda3/envs/rcbevdet/bin/python

Additional information

No response

@wilsonlv
Copy link

这项目不知道还活着没有,50系列显卡使用cu128各种报错

@1273603741
Copy link
Author

这项目不知道还活着没有,50系列显卡使用cu128各种报错
可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

@wilsonlv
Copy link

这项目不知道还活着没有,50系列显卡使用cu128各种报错
可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

@1273603741
Copy link
Author

这项目不知道还活着没有,50系列显卡使用cu128各种报错
可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好

@wilsonlv
Copy link

这项目不知道还活着没有,50系列显卡使用cu128各种报错
可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好

麻烦分享一下改动的源码,谢谢

@jzy12312
Copy link

这项目不知道还活着没有,50系列显卡使用cu128各种报错
可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好

能否出个git分享一下代码,谢谢了

@516525465
Copy link

这项目不知道还活着没有,50系列显卡使用cu128各种报错
可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好

请问楼主是如何解决的啊?特别希望能换新的50系显卡跑之前的项目,之前
用的0.30.0的mmcv

@1273603741
Copy link
Author

这项目不知道还活着没有,50系列显卡使用cu128各种报错
可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好

请问楼主是如何解决的啊?特别希望能换新的50系显卡跑之前的项目,之前 用的0.30.0的mmcv

我是找了个技术帮助解决的,mmcv代码里面改了些东西,挺麻烦的

@jhluaa
Copy link

jhluaa commented May 29, 2025

改了啥呢,好几天了哥们

@wilsonlv
Copy link

摸索出来了哥们们,文档已经整理好了,可以参考一下:
https://gitee.com/Wilson_Lws/MuseTalk-50Series-Adaptation/blob/master/README.md

@1273603741
Copy link
Author

摸索出来了哥们们,文档已经整理好了,可以参考一下: https://gitee.com/Wilson_Lws/MuseTalk-50Series-Adaptation/blob/master/README.md

牛逼

@ShihaoHan19980609
Copy link

直接按照官网文档里面的源码编译安装成功了,wsl2里面,cuda12.8+pytorch2.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants