Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llamafactory最新版0.9.2.dev0,unsloth加速训练报错 #6836

Open
1 task done
yecphaha opened this issue Feb 6, 2025 · 0 comments
Open
1 task done

llamafactory最新版0.9.2.dev0,unsloth加速训练报错 #6836

yecphaha opened this issue Feb 6, 2025 · 0 comments
Labels
bug Something isn't working pending This problem is yet to be addressed

Comments

@yecphaha
Copy link

yecphaha commented Feb 6, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

llamafactory版本:0.9.2.dev0
python版本:3.10.16
显卡:A100,80G

python依赖

torch=2.4.0
transformers=4.48.2
triton=3.0.0
trl=0.9.6
unsloth=2025.1.8
unsloth_zoo=2025.1.5
xformers=0.0.27.post2

执行脚本

CUDA_VISIBLE_DEVICES=0 llamafactory-cli train yaml_data/qwen2-7b_lora_sft.yaml

脚本内容

## model
model_name_or_path: /model/Qwen2-7B-Instruct

## method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
lora_alpha: 16
lora_dropout: 0.0
lora_rank: 8

## dataset
dataset: sft_2477
template: qwen
cutoff_len: 32768
max_samples: 3000
overwrite_cache: true
preprocessing_num_workers: 16

## output
output_dir: /qwen_model/lora/qwen2_7b_2477
logging_steps: 1
save_steps: 2477
plot_loss: true
overwrite_output_dir: true

## train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 7.0e-5
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
ddp_timeout: 180000000
neftune_noise_alpha: 16
use_unsloth: true
fp16: true

Reproduction

[INFO|2025-02-06 17:01:18] llamafactory.model.model_utils.checkpointing:157 >> Gradient checkpointing enabled.
[INFO|2025-02-06 17:01:18] llamafactory.model.adapter:157 >> Upcasting trainable params to float32.
[INFO|2025-02-06 17:01:18] llamafactory.model.adapter:157 >> Fine-tuning method: LoRA
[INFO|2025-02-06 17:01:18] llamafactory.model.model_utils.misc:157 >> Found linear modules: o_proj,gate_proj,up_proj,k_proj,v_proj,down_proj,q_proj
[WARNING|logging.py:328] 2025-02-06 17:01:19,863 >> Unsloth 2025.1.8 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.
[INFO|2025-02-06 17:01:21] llamafactory.model.loader:157 >> trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:741] 2025-02-06 17:01:21,223 >> Using auto half precision backend
[WARNING|<string>:215] 2025-02-06 17:01:21,499 >> ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 2,477 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 1
\        /    Total batch size = 1 | Total steps = 2,477
 "-____-"     Number of trainable parameters = 20,185,088
  0%|                                                                                                                                              | 0/2477 [00:00<?, ?it/s]/tmp/tmpi9dfeij2/main.c:6:23: 致命错误:stdatomic.h:没有那个文件或目录
 #include <stdatomic.h>
                       ^
编译中断。
Traceback (most recent call last):
  File "/sie/anaconda3/envs/yecp_main/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/sie/yecp/code/llama_factory_main/src/llamafactory/cli.py", line 112, in main
    run_exp()
  File "/sie/yecp/code/llama_factory_main/src/llamafactory/train/tuner.py", line 92, in run_exp
    _training_function(config={"args": args, "callbacks": callbacks})
  File "/sie/yecp/code/llama_factory_main/src/llamafactory/train/tuner.py", line 66, in _training_function
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/sie/yecp/code/llama_factory_main/src/llamafactory/train/sft/workflow.py", line 101, in run_sft
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/transformers/trainer.py", line 2171, in train
    return inner_training_loop(
  File "<string>", line 382, in _fast_inner_training_loop
  File "<string>", line 31, in _unsloth_training_step
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/unsloth/models/_utils.py", line 1069, in _unsloth_pre_compute_loss
    return self._old_compute_loss(model, inputs, *args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/transformers/trainer.py", line 3731, in compute_loss
    outputs = model(**inputs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
    return func(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/_compile.py", line 31, in inner
    return disable_fn(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
    return fn(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/unsloth/models/llama.py", line 1130, in PeftModelForCausalLM_fast_forward
    return self.base_model(
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 188, in forward
    return self.model.forward(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/unsloth/models/llama.py", line 990, in _CausalLM_fast_forward
    outputs = self.model(
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/unsloth/models/llama.py", line 821, in LlamaModel_fast_forward
    hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply(
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 455, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/unsloth_zoo/gradient_checkpointing.py", line 147, in forward
    output = forward_function(hidden_states, *args)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/unsloth/models/llama.py", line 507, in LlamaDecoderLayer_fast_forward
    hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
    return fn(*args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 210, in fast_rms_layernorm
    out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 156, in forward
    fx[(n_rows,)](
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/runtime/jit.py", line 345, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/runtime/jit.py", line 607, in run
    device = driver.active.get_current_device()
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver
    return actives[0]()
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 371, in __init__
    self.utils = CudaUtils()  # TODO: make static
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in __init__
    mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
    so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/runtime/build.py", line 48, in _build
    ret = subprocess.check_call(cc_cmd)
  File "/sie/anaconda3/envs/yecp_main/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpi9dfeij2/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpi9dfeij2/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/backends/nvidia/lib', '-L/lib64', '-L/lib', '-I/sie/anaconda3/envs/yecp_main/lib/python3.10/site-packages/triton/backends/nvidia/include', '-I/tmp/tmpi9dfeij2', '-I/sie/anaconda3/envs/yecp_main/include/python3.10']' returned non-zero exit status 1.
  0%|                                                                                                                                              | 0/2477 [00:00<?, ?it/s]

Others

No response

@yecphaha yecphaha added bug Something isn't working pending This problem is yet to be addressed labels Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant