Skip to content

How to OPT model with PowerInfer? #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
wuooo339 opened this issue Dec 24, 2024 · 6 comments
Open
3 tasks done

How to OPT model with PowerInfer? #234

wuooo339 opened this issue Dec 24, 2024 · 6 comments
Labels
question Further information is requested

Comments

@wuooo339
Copy link

wuooo339 commented Dec 24, 2024

Prerequisites

Before submitting your question, please ensure the following:

  • I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
  • I have carefully read and followed the instructions in the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).

Question Details

I have read your article of PowerInfer and I have seen you use OPT-30B to compare with llama.cpp.But I can not read any information about OPT in this READMD?

Additional Context

I want to test OPT model using PowerInfer.So I might need your help?
I am from Harbin Institude of Technology studying HPC(high performance computing).And I am trying different offloading strategies recently.

@wuooo339 wuooo339 added the question Further information is requested label Dec 24, 2024
@YixinSong-e
Copy link
Collaborator

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned.

@Ryuukinn55
Copy link

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned.

Hi, I want to know when the OPT model and its related code are expected to be released?

@wuooo339
Copy link
Author

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned.
@YixinSong-e
Now I have seen the code for OPT models but how to get the predictor of OPT and convert it to use PowerInfer?

@AliceRayLu
Copy link
Contributor

AliceRayLu commented Feb 24, 2025

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned.
@YixinSong-e
Now I have seen the code for OPT models but how to get the predictor of OPT and convert it to use PowerInfer?

@wuooo339 @Ryuukinn55 Hi everyone! Our code for the OPT model has been officially released. Our predictor is now available on HuggingFace: https://huggingface.co/PowerInfer/OPT-7B-predictor. For other model sizes, such as 13B or larger, we will release the predictors soon, within the next few days.

You can convert the model from the original version at https://huggingface.co/facebook/opt-6.7b using the convert.py script. First, download the model, and then run the following command:

python convert.py --outfile /PATH/TO/POWERINFER/GGUF/REPO/MODELNAME.powerinfer.gguf /PATH/TO/ORIGINAL/MODEL /PATH/TO/PREDICTOR

For any other questions, please feel free to ask!

@wuooo339
Copy link
Author

@YixinSong-e
I found this problem when running opt-6b7 in 4080S GPU.And the command is
./build/bin/main -m /share-data/wzk-1/model/powerinfer/opt-6.7b.powerinfer.gguf -n 32 -t 8 -p "Paris is the capital city of" --vram-budget 6.9
which opt-6.7b.powerinfer.gguf is convert from https://huggingface.co/PowerInfer/OPT-7B-predictor and https://huggingface.co/facebook/opt-6.7b.

llm_load_gpu_split_with_budget: error: activation files under '/share-data/wzk-1/model/powerinfer/activation' not found
llm_load_gpu_split: error: failed to generate gpu split, an empty one will be used
offload_ffn_split: applying augmentation to model - please wait ...

@wuooo339
Copy link
Author

@YixinSong-e I found this problem when running opt-6b7 in 4080S GPU.And the command is ./build/bin/main -m /share-data/wzk-1/model/powerinfer/opt-6.7b.powerinfer.gguf -n 32 -t 8 -p "Paris is the capital city of" --vram-budget 6.9 which opt-6.7b.powerinfer.gguf is convert from https://huggingface.co/PowerInfer/OPT-7B-predictor and https://huggingface.co/facebook/opt-6.7b.

llm_load_gpu_split_with_budget: error: activation files under '/share-data/wzk-1/model/powerinfer/activation' not found llm_load_gpu_split: error: failed to generate gpu split, an empty one will be used offload_ffn_split: applying augmentation to model - please wait ...

sorry ,I found the activation in the predictor file.And the model after convertion should be put in the same file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants