Meta: Wider model support for PowerInfer #93

hodlen · 2023-12-27T09:14:34Z

PowerInfer currently optimizes for LLMs (Large Language Models) that utilize the ReLU activation function, leveraging their internal activation locality. However, many of the trending models do not use ReLU activation, creating a significant gap in PowerInfer's applicability.

This ongoing issue tracks our efforts to onboard new LLMs, particularly those in high demand within the community, and to continually enhance our existing ReLU-based LLMs.

Onboarding Progress

We're actively fine-tuning models into ReLU sparse models:

Mistral 7B (Now released as Bamboo)

Inviting broader participation, we're also:

Releasing guidelines and reference implementations for converting LLMs to ReLU-based models.
Open-sourcing our predictor training code post and during ReLU LLM fine-tuning.

Onboarding New Models

We recognize that fine-tuning upstream models is computationally intensive, and the requirement for high-quality data often surpasses our current capabilities. As such, we are actively seeking industrial collaborations to unlock more of PowerInfer's potential and bring state-of-the-art models to a wider audience. For direct inquiries and partnership discussions, please contact us at [email protected].

We will also focus on models that have garnered significant interest in our community 🌟. Your input and feedback are highly valued and encouraged! 💬👍

linkerlin · 2024-01-03T08:07:38Z

I believe that a statistical method could be employed to set all outputs of non-ReLU activation functions that are below, for instance, the 30th percentile to zero, in a similar manner to obtain sparsity guarantees akin to those provided by ReLU.

samvanity · 2024-01-26T01:14:31Z

It's also important to keep MoE models in mind when you expand the compatibility of PowerInfer. The ceiling for consumer grade GPUs is around 3_0 for a 8x7b so if Powerinfer can easily handle 5_k_m or even 6_k for a 8x7b, then it will really be good news.

Create a ReLu version for the popular Mixtral Instruct v0.1 (https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) and the dolphin fine tuned (https://huggingface.co/cognitivecomputations/dolphin-2.7-mixtral-8x7b), and people will start taking this project seriously.

YixinSong-e · 2024-01-26T04:27:40Z

It's also important to keep MoE models in mind when you expand the compatibility of PowerInfer. The ceiling for consumer grade GPUs is around 3_0 for a 8x7b so if Powerinfer can easily handle 5_k_m or even 6_k for a 8x7b, then it will really be good news.

Create a ReLu version for the popular Mixtral Instruct v0.1 (https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) and the dolphin fine tuned (https://huggingface.co/cognitivecomputations/dolphin-2.7-mixtral-8x7b), and people will start taking this project seriously.

Thank you for your insight. Actually we are training mixtral now. Please wait for our updates. :)

llCurious · 2024-01-26T07:44:00Z

Hi @YixinSong-e . I notice that you provide the ReLU-LLaMA in HF. I run the model and found that the sparsity (values lower than zero) is much lower than OPT models, which could even achieve 99%. ReLU-LLaMA can only achieve about 70~80%. It seems has degradation on the sparse matmul, considering that much lower sparsity is observed.

YixinSong-e · 2024-01-26T11:40:09Z

Hi @YixinSong-e . I notice that you provide the ReLU-LLaMA in HF. I run the model and found that the sparsity (values lower than zero) is much lower than OPT models, which could even achieve 99%. ReLU-LLaMA can only achieve about 70~80%. It seems has degradation on the sparse matmul, considering that much lower sparsity is observed.

Hello @llCurious. Yes, for now relu-llama has limited sparsity due to GLU varient, and thus the acceleration ratio of relullama is also relatively less compared to OPT. Interestingly, we found in the reglu activation function that even if there are some activation values that are not 0, they can still be ignored.

To push more sparisity in GLU based model, we currently did some experiments on mistral, which we will release recently.

llCurious · 2024-01-27T07:54:38Z

Thanks for your reply. I have a question that in my understanding, ReGLU uses element-wise multiplication, which means those zero values after ReLU remain zero, theoretically yilelding same sparsity level as ReLU?

BTW, i wonder how do you calculate the CDF in Figure 5 (power-law activation).

YixinSong-e · 2024-01-27T10:23:03Z

Thanks for your reply. I have a question that in my understanding, ReGLU uses element-wise multiplication, which means those zero values after ReLU remain zero, theoretically yilelding same sparsity level as ReLU?

BTW, i wonder how do you calculate the CDF in Figure 5 (power-law activation).

First, zero values after ReLU remain zero is right. Further, some value after ReLU multiplication with GLU output is very close to zero, which can also be ignored. We will provide a specific explanation of this phenomenon in a paper (in the coming weeks)

Second, we do this by collecting the number of activations of all neurons in a given corpus. Then we calculate the CDF of activation counts by sorting the neurons in descending order of activation counts.

guyk1971 · 2024-01-28T15:09:37Z

Thanks for your reply. I have a question that in my understanding, ReGLU uses element-wise multiplication, which means those zero values after ReLU remain zero, theoretically yilelding same sparsity level as ReLU?
BTW, i wonder how do you calculate the CDF in Figure 5 (power-law activation).

First, zero values after ReLU remain zero is right. Further, some value after ReLU multiplication with GLU output is very close to zero, which can also be ignored. We will provide a specific explanation of this phenomenon in a paper (in the coming weeks)

Second, we do this by collecting the number of activations of all neurons in a given corpus. Then we calculate the CDF of activation counts by sorting the neurons in descending order of activation counts.

Do you have plans to release the code for the profiler that collects the activation statistics ? it would be great to evaluate various models and working points. thanks !

llCurious · 2024-03-04T03:33:34Z

hi @hodlen . I notice that you provide ReLUFalcon-40B in the HF. Do you have the tuned ReLU-Falcon-7B weights?

hodlen · 2024-03-04T03:52:28Z

hi @hodlen . I notice that you provide ReLUFalcon-40B in the HF. Do you have the tuned ReLU-Falcon-7B weights?

We haven't tuned the Falcon 7B model and currently have no plan to do so. After reviewing benchmarks performances, we've opted to focus on our tuning efforts on Mistral 7B which has demonstrated to be a more robust foundation model for this scale.

ejrydhfs · 2025-02-14T20:10:15Z

Thanks for your reply. I have a question that in my understanding, ReGLU uses element-wise multiplication, which means those zero values after ReLU remain zero, theoretically yilelding same sparsity level as ReLU?
BTW, i wonder how do you calculate the CDF in Figure 5 (power-law activation).

First, zero values after ReLU remain zero is right. Further, some value after ReLU multiplication with GLU output is very close to zero, which can also be ignored. We will provide a specific explanation of this phenomenon in a paper (in the coming weeks)
Second, we do this by collecting the number of activations of all neurons in a given corpus. Then we calculate the CDF of activation counts by sorting the neurons in descending order of activation counts.

Do you have plans to release the code for the profiler that collects the activation statistics ? it would be great to evaluate various models and working points. thanks !

~~I believe they use a custom version of https://github.com/FMInference/DejaVu for that~~

According to their paper they use custom profiling and solving code I too await the release of the profiling and solving code, but it shouldn't be impossible to replicate I understand that it takes in a corpus, feeds it to the model, then looks at what parts of it are activated and how much, then categorizes the neurons by how much they are used, into hot or cold neurons

hodlen added this to PowerInfer ToDo Dec 26, 2023

hodlen converted this from a draft issue Dec 27, 2023

hodlen added the tracker Track related issues and linked to a Project item label Dec 27, 2023

hodlen mentioned this issue Dec 27, 2023

Can we make it run on other models? #83

Open

hodlen pinned this issue Dec 27, 2023

hodlen mentioned this issue Dec 29, 2023

是否考虑支持codellama #104

Open

hodlen mentioned this issue Jan 29, 2024

How to acquire predictor weights #135

Closed

hodlen mentioned this issue Feb 13, 2024

关于LLaMA-70B-PowerInfer-GGUF的chat版本 #143

Open

hodlen mentioned this issue Feb 22, 2024

Please make a tinyllama v1.0 version for use. #148

Open

hodlen mentioned this issue Apr 6, 2024

convert.py: error: the following arguments are required: mlp_model #173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta: Wider model support for PowerInfer #93

Meta: Wider model support for PowerInfer #93

hodlen commented Dec 27, 2023 •

edited

Loading

linkerlin commented Jan 3, 2024

samvanity commented Jan 26, 2024

YixinSong-e commented Jan 26, 2024 •

edited

Loading

llCurious commented Jan 26, 2024

YixinSong-e commented Jan 26, 2024 •

edited

Loading

llCurious commented Jan 27, 2024

YixinSong-e commented Jan 27, 2024 •

edited

Loading

guyk1971 commented Jan 28, 2024

llCurious commented Mar 4, 2024

hodlen commented Mar 4, 2024

ejrydhfs commented Feb 14, 2025 •

edited

Loading

Meta: Wider model support for PowerInfer #93

Meta: Wider model support for PowerInfer #93

Comments

hodlen commented Dec 27, 2023 • edited Loading

Onboarding Progress

Onboarding New Models

linkerlin commented Jan 3, 2024

samvanity commented Jan 26, 2024

YixinSong-e commented Jan 26, 2024 • edited Loading

llCurious commented Jan 26, 2024

YixinSong-e commented Jan 26, 2024 • edited Loading

llCurious commented Jan 27, 2024

YixinSong-e commented Jan 27, 2024 • edited Loading

guyk1971 commented Jan 28, 2024

llCurious commented Mar 4, 2024

hodlen commented Mar 4, 2024

ejrydhfs commented Feb 14, 2025 • edited Loading

hodlen commented Dec 27, 2023 •

edited

Loading

YixinSong-e commented Jan 26, 2024 •

edited

Loading

YixinSong-e commented Jan 26, 2024 •

edited

Loading

YixinSong-e commented Jan 27, 2024 •

edited

Loading

ejrydhfs commented Feb 14, 2025 •

edited

Loading