Skip to content

Meet error in serving with huggingface inference tutorial #16

Closed
@JF-D

Description

@JF-D

Hi, Arctic team, Great work! I followed the Huggingface Inference Tutorial to do the inference. But I met the following error:

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 195/195 [24:34<00:00,  7.56s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:31999 for open-end generation.
Traceback (most recent call last):
  File "/mnt/afs/jfduan/LLMInfer/snowflake-arctic/inference/hf_infer.py", line 28, in <module>
    outputs = model.generate(input_ids=input_ids, max_new_tokens=20)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/generation/utils.py", line 1572, in generate
    result = self._greedy_search(
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/generation/utils.py", line 2477, in _greedy_search
    outputs = self(
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/models/arctic/modeling_arctic.py", line 1708, in forward
    outputs = self.model(
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/models/arctic/modeling_arctic.py", line 1397, in forward
    layer_outputs = decoder_layer(
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/models/arctic/modeling_arctic.py", line 1087, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/models/arctic/modeling_arctic.py", line 808, in forward
    query_states = self.q_proj(hidden_states)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/hooks.py", line 161, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/hooks.py", line 347, in pre_forward
    set_module_tensor_to_device(
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 358, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([7168, 7168]) in "weight" (which has shape torch.Size([100352, 516])), this look incorrect.

Can you help me resolve this? Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions