How to OPT model with PowerInfer? #234

wuooo339 · 2024-12-24T09:21:01Z

Prerequisites

Before submitting your question, please ensure the following:

I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
I have carefully read and followed the instructions in the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).

Question Details

I have read your article of PowerInfer and I have seen you use OPT-30B to compare with llama.cpp.But I can not read any information about OPT in this READMD?

Additional Context

I want to test OPT model using PowerInfer.So I might need your help?
I am from Harbin Institude of Technology studying HPC(high performance computing).And I am trying different offloading strategies recently.

YixinSong-e · 2024-12-24T10:52:55Z

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned.

Ryuukinn55 · 2025-01-07T13:43:39Z

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned.

Hi, I want to know when the OPT model and its related code are expected to be released?

wuooo339 · 2025-02-24T02:37:38Z

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned.
@YixinSong-e
Now I have seen the code for OPT models but how to get the predictor of OPT and convert it to use PowerInfer?

AliceRayLu · 2025-02-24T05:10:16Z

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned.
@YixinSong-e
Now I have seen the code for OPT models but how to get the predictor of OPT and convert it to use PowerInfer?

@wuooo339 @Ryuukinn55 Hi everyone! Our code for the OPT model has been officially released. Our predictor is now available on HuggingFace: https://huggingface.co/PowerInfer/OPT-7B-predictor. For other model sizes, such as 13B or larger, we will release the predictors soon, within the next few days.

You can convert the model from the original version at https://huggingface.co/facebook/opt-6.7b using the convert.py script. First, download the model, and then run the following command:

python convert.py --outfile /PATH/TO/POWERINFER/GGUF/REPO/MODELNAME.powerinfer.gguf /PATH/TO/ORIGINAL/MODEL /PATH/TO/PREDICTOR

For any other questions, please feel free to ask!

wuooo339 · 2025-03-14T09:26:29Z

@YixinSong-e
I found this problem when running opt-6b7 in 4080S GPU.And the command is
./build/bin/main -m /share-data/wzk-1/model/powerinfer/opt-6.7b.powerinfer.gguf -n 32 -t 8 -p "Paris is the capital city of" --vram-budget 6.9
which opt-6.7b.powerinfer.gguf is convert from https://huggingface.co/PowerInfer/OPT-7B-predictor and https://huggingface.co/facebook/opt-6.7b.

llm_load_gpu_split_with_budget: error: activation files under '/share-data/wzk-1/model/powerinfer/activation' not found
llm_load_gpu_split: error: failed to generate gpu split, an empty one will be used
offload_ffn_split: applying augmentation to model - please wait ...

wuooo339 · 2025-03-14T11:42:25Z

@YixinSong-e I found this problem when running opt-6b7 in 4080S GPU.And the command is ./build/bin/main -m /share-data/wzk-1/model/powerinfer/opt-6.7b.powerinfer.gguf -n 32 -t 8 -p "Paris is the capital city of" --vram-budget 6.9 which opt-6.7b.powerinfer.gguf is convert from https://huggingface.co/PowerInfer/OPT-7B-predictor and https://huggingface.co/facebook/opt-6.7b.

llm_load_gpu_split_with_budget: error: activation files under '/share-data/wzk-1/model/powerinfer/activation' not found llm_load_gpu_split: error: failed to generate gpu split, an empty one will be used offload_ffn_split: applying augmentation to model - please wait ...

sorry ,I found the activation in the predictor file.And the model after convertion should be put in the same file.

wuooo339 added the question label Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to OPT model with PowerInfer? #234

How to OPT model with PowerInfer? #234

wuooo339 commented Dec 24, 2024 •

edited

Loading

YixinSong-e commented Dec 24, 2024

Uh oh!

Ryuukinn55 commented Jan 7, 2025

Uh oh!

wuooo339 commented Feb 24, 2025

Uh oh!

AliceRayLu commented Feb 24, 2025 •

edited

Loading

Uh oh!

wuooo339 commented Mar 14, 2025

Uh oh!

wuooo339 commented Mar 14, 2025

Uh oh!

How to OPT model with PowerInfer? #234

How to OPT model with PowerInfer? #234

Comments

wuooo339 commented Dec 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites

Question Details

Additional Context

YixinSong-e commented Dec 24, 2024

Uh oh!

Ryuukinn55 commented Jan 7, 2025

Uh oh!

wuooo339 commented Feb 24, 2025

Uh oh!

AliceRayLu commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wuooo339 commented Mar 14, 2025

Uh oh!

wuooo339 commented Mar 14, 2025

Uh oh!

wuooo339 commented Dec 24, 2024 •

edited

Loading

AliceRayLu commented Feb 24, 2025 •

edited

Loading