We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
`nproc_per_node=8
TOKENIZERS_PARALLELISM=true SIZE_FACTOR=8 MAX_PIXELS=602112 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 NPROC_PER_NODE=$nproc_per_node python3 -m swift.cli.main sft --model ../qwen_model/Qwen2_5-14B-Instruct --train_type full --model_type qwen2_5 --dataset ./data/kw1_cot/train_new.jsonl --torch_dtype bfloat16 --num_train_epochs 1 --learning_rate 1e-5 --target_modules all-linear --gradient_accumulation_steps $(expr 16 / $nproc_per_node) --eval_steps 100 --save_steps 100 --save_total_limit 5 --logging_steps 5 --max_length 8192 --output_dir ./data/kw1_cot/output/qwen2_5-14B-Instruct --system 'You are a helpful assistant.' --warmup_ratio 0.05 --dataloader_num_workers 4 --deepspeed zero3`
The text was updated successfully, but these errors were encountered:
torch 1.13.1+cu116 ms-swift 3.0.2.post1 transformers 4.40.0
报错信息,求问咋解决
Sorry, something went wrong.
可以试试ddp & device_map 开2个进程 8张卡
No branches or pull requests
`nproc_per_node=8
TOKENIZERS_PARALLELISM=true SIZE_FACTOR=8 MAX_PIXELS=602112 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=$nproc_per_node
python3 -m swift.cli.main sft
--model ../qwen_model/Qwen2_5-14B-Instruct
--train_type full
--model_type qwen2_5
--dataset ./data/kw1_cot/train_new.jsonl
--torch_dtype bfloat16
--num_train_epochs 1
--learning_rate 1e-5
--target_modules all-linear
--gradient_accumulation_steps $(expr 16 / $nproc_per_node)
--eval_steps 100
--save_steps 100
--save_total_limit 5
--logging_steps 5
--max_length 8192
--output_dir ./data/kw1_cot/output/qwen2_5-14B-Instruct
--system 'You are a helpful assistant.'
--warmup_ratio 0.05
--dataloader_num_workers 4
--deepspeed zero3`
The text was updated successfully, but these errors were encountered: