-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[Bug] 5090显卡无法适配 #3283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
这项目不知道还活着没有,50系列显卡使用cu128各种报错 |
|
请问楼主具体怎么解决的 |
是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好 |
麻烦分享一下改动的源码,谢谢 |
能否出个git分享一下代码,谢谢了 |
请问楼主是如何解决的啊?特别希望能换新的50系显卡跑之前的项目,之前 |
我是找了个技术帮助解决的,mmcv代码里面改了些东西,挺麻烦的 |
改了啥呢,好几天了哥们 |
摸索出来了哥们们,文档已经整理好了,可以参考一下: |
牛逼 |
直接按照官网文档里面的源码编译安装成功了,wsl2里面,cuda12.8+pytorch2.7 |
Prerequisite
Environment
5090显卡根本配不了环境,怎么都是显示mmcv库缺失,cuda12.8赶紧给适配下5090好用啊
Reproduces the problem - code sample
Traceback (most recent call last):
File "./tools/train.py", line 287, in
main()
File "./tools/train.py", line 276, in main
train_model(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 351, in train_model
train_detector(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 227, in train_detector
model = MMDistributedDataParallel(
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 899262) of binary: /home/super/anaconda3/envs/rcbevdet/bin/python
Reproduces the problem - command or script
适配5090显卡
Reproduces the problem - error message
Traceback (most recent call last):
File "./tools/train.py", line 287, in
main()
File "./tools/train.py", line 276, in main
train_model(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 351, in train_model
train_detector(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 227, in train_detector
model = MMDistributedDataParallel(
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 899262) of binary: /home/super/anaconda3/envs/rcbevdet/bin/python
Additional information
No response
The text was updated successfully, but these errors were encountered: