-
Notifications
You must be signed in to change notification settings - Fork 429
Meta: Wider model support for PowerInfer #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I believe that a statistical method could be employed to set all outputs of non-ReLU activation functions that are below, for instance, the 30th percentile to zero, in a similar manner to obtain sparsity guarantees akin to those provided by ReLU. |
It's also important to keep MoE models in mind when you expand the compatibility of PowerInfer. The ceiling for consumer grade GPUs is around 3_0 for a 8x7b so if Powerinfer can easily handle 5_k_m or even 6_k for a 8x7b, then it will really be good news. Create a ReLu version for the popular Mixtral Instruct v0.1 (https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) and the dolphin fine tuned (https://huggingface.co/cognitivecomputations/dolphin-2.7-mixtral-8x7b), and people will start taking this project seriously. |
Thank you for your insight. Actually we are training mixtral now. Please wait for our updates. :) |
Hi @YixinSong-e . I notice that you provide the ReLU-LLaMA in HF. I run the model and found that the sparsity (values lower than zero) is much lower than OPT models, which could even achieve 99%. ReLU-LLaMA can only achieve about 70~80%. It seems has degradation on the sparse matmul, considering that much lower sparsity is observed. |
Hello @llCurious. Yes, for now relu-llama has limited sparsity due to GLU varient, and thus the acceleration ratio of relullama is also relatively less compared to OPT. Interestingly, we found in the reglu activation function that even if there are some activation values that are not 0, they can still be ignored. To push more sparisity in GLU based model, we currently did some experiments on mistral, which we will release recently. |
Thanks for your reply. I have a question that in my understanding, ReGLU uses element-wise multiplication, which means those zero values after ReLU remain zero, theoretically yilelding same sparsity level as ReLU? BTW, i wonder how do you calculate the CDF in Figure 5 (power-law activation). |
First, zero values after ReLU remain zero is right. Further, some value after ReLU multiplication with GLU output is very close to zero, which can also be ignored. We will provide a specific explanation of this phenomenon in a paper (in the coming weeks) Second, we do this by collecting the number of activations of all neurons in a given corpus. Then we calculate the CDF of activation counts by sorting the neurons in descending order of activation counts. |
Do you have plans to release the code for the profiler that collects the activation statistics ? it would be great to evaluate various models and working points. thanks ! |
hi @hodlen . I notice that you provide ReLUFalcon-40B in the HF. Do you have the tuned ReLU-Falcon-7B weights? |
We haven't tuned the Falcon 7B model and currently have no plan to do so. After reviewing benchmarks performances, we've opted to focus on our tuning efforts on Mistral 7B which has demonstrated to be a more robust foundation model for this scale. |
According to their paper they use custom profiling and solving code I too await the release of the profiling and solving code, but it shouldn't be impossible to replicate I understand that it takes in a corpus, feeds it to the model, then looks at what parts of it are activated and how much, then categorizes the neurons by how much they are used, into hot or cold neurons |
PowerInfer currently optimizes for LLMs (Large Language Models) that utilize the ReLU activation function, leveraging their internal activation locality. However, many of the trending models do not use ReLU activation, creating a significant gap in PowerInfer's applicability.
This ongoing issue tracks our efforts to onboard new LLMs, particularly those in high demand within the community, and to continually enhance our existing ReLU-based LLMs.
Onboarding Progress
We're actively fine-tuning models into ReLU sparse models:
Inviting broader participation, we're also:
Onboarding New Models
We recognize that fine-tuning upstream models is computationally intensive, and the requirement for high-quality data often surpasses our current capabilities. As such, we are actively seeking industrial collaborations to unlock more of PowerInfer's potential and bring state-of-the-art models to a wider audience. For direct inquiries and partnership discussions, please contact us at [email protected].
We will also focus on models that have garnered significant interest in our community 🌟. Your input and feedback are highly valued and encouraged! 💬👍
The text was updated successfully, but these errors were encountered: