You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Integrate a user-friendly command-line interface into Powerinfer. By analyzing the command-line interfaces of well-regarded, widely-adopted edge-side inference frameworks (llama.cpp, ollama, and nexa - sdk), propose an interface that is easy for users to operate, reducing their learning costs and enabling them to quickly get the hang of it
1. Running a Model
llama-run
Command Format: llama-run [options] model [prompt]
Parameter Explanation:
-c, --context-size <value>: Set the context size. The default is 2048
-n, --ngl <value>: Specify the number of GPU layers. The default is 0
--temp <value>: Set the temperature. The default is 0.8
-v, --verbose, --log-verbose: Set the verbosity level for debugging
-h, --help: Display help information
Model Specification Method: The model string can have prefixes such as huggingface:// (or hf://), ollama://, https://, file://. If there is no prefix, if the file exists, it defaults to file://; if it does not exist, it defaults to ollama://
Command Format: nexa run MODEL_PATH (By default, it runs GGUF models. To run ONNX models, use nexa onnx MODEL_PATH)
Parameter Explanation (taking text generation models as an example):
-t, --temperature TEMPERATURE: Temperature
-m, --max_new_tokens MAX_NEW_TOKENS: Maximum number of new tokens to generate
-k, --top_k TOP_K: Top-k parameter
-p, --top_p TOP_P: Top-p parameter
-sw, --stop_words [STOP_WORDS ...]: List of stop words to stop generation early
--nctx: Set the context size
-pf, --profiling: Enable performance profiling
-st, --streamlit: Run inference in the Streamlit UI, providing a visual interactive interface
-lp, --local_path: Indicate that the provided model path is a local path
-mt, --model_type: Specify the model running type, options include NLP, COMPUTER_VISION, MULTIMODAL, AUDIO. When using, you must specify -lp or -hf or -ms
-hf, --huggingface: Load the model from the Hugging Face Hub
-ms, --modelscope: Load the model from the ModelScope Hub
Model Specification Method: In Nexa, it is the path or identifier of the model; when using the -hf flag, it is the identifier of the Hugging Face repository; when using the -ms flag, it is the identifier of the ModelScope model
Examples: nexa run llama2, nexa run sd1-4
ollama
Command Format: ollama run MODEL_NAME
Parameter Explanation: There are no additional running parameters. You can customize the model running parameters by creating a Modelfile, such as setting the temperature and system messages
Model Specification Method: Use the model name directly. The model names are listed on ollama.com/library. You can also use custom model names (created through ollama create)
Examples: ollama run llama2, ollama run mario
powerinfer (Proposed, similar to ollama)
Command Format: powerinfer run MODEL_NAME
Parameter Explanation: Customize the model running parameters through the model file
Model Specification Method: Use the model name directly. You can use officially provided models or custom model names (created through powerinfer create)
Support for text-to-image, TTS, etc.: Can be provided through the UI or CLI (by entering a file path).
2. Downloading a Model
llama-run: There is no dedicated command to download a model. If the model does not exist when running, it will be automatically downloaded. The file name will have the .partial extension during the download and will be renamed to remove this extension after completion
nexa-sdk
Command Format: nexa pull MODEL_PATH
Parameter Explanation:
-hf, --huggingface: Pull the model from the Hugging Face Hub
-ms, --modelscope: Pull the model from the ModelScope Hub
-o, --output_path OUTPUT_PATH: Specify a custom output path for the pulled model
Example: nexa pull llama2
ollama
Command Format: ollama pull MODEL_NAME, used to pull a model and can also be used to update a local model (only pull the different parts)
Parameter Explanation: None
Example: ollama pull llama2
powerinfer (Proposed, similar to nexa-sdk)
Command Format: powerinfer pull MODEL_NAME, with incremental updates
Parameter Explanation:
-hf, --huggingface: Pull the model from the Hugging Face Hub
-ms, --modelscope: Pull the model from the ModelScope Hub
-o, --output_path OUTPUT_PATH: Specify a custom output path for the pulled model
Example: powerinfer pull llama2
3. Listing Local Models
llama-run: There is no dedicated command to list local models
nexa-sdk
Command Format: nexa list
Function: List all local models
ollama
Command Format: ollama list
Function: List all local models
powerinfer (Proposed)
Command Format: powerinfer list
Function: List all local models
4. Deleting a Model
llama-run: There is no dedicated command to delete a model
nexa-sdk
Command Format: nexa remove MODEL_PATH
Function: Delete the specified model from the local computer
Example: nexa remove llama2
ollama
Command Format: ollama rm MODEL_NAME
Function: Delete the specified model from the local computer
Example: ollama rm llama2
powerinfer (Proposed)
Command Format: powerinfer rm MODEL_NAME
Function: Delete the specified model from the local computer
Example: powerinfer rm llama2
5. Creating/Converting a Model
llama-run: There is no function to create or convert a model
Parameter Explanation: You need to install the nexa-gguf package. There are many parameters to set the conversion process, such as the number of threads, quantization type, output tensor type, etc
Function: Convert and quantize a Hugging Face model to the GGUF format
Function: Create a model according to the Modelfile. It can be used to import GGUF, PyTorch, or Safetensors models and customize model parameters and prompts
Example: To create a custom mario model:
FROM llama2
PARAMETER temperature 1
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only
"""
ollama create mario -f ./Modelfile
llama-run: There is no user authentication and related information viewing function
nexa-sdk
login command: nexa login, log in to the Nexa API to obtain access permissions
whoami command: nexa whoami, display information about the currently logged-in user
logout command: nexa logout, log out of the Nexa API
ollama: There is no user authentication and related information viewing function
powerinfer (Proposed): None
7. Other Functions
llama-run: None
nexa-sdk
embed command: nexa embed MODEL_PATH prompt, convert text to embeddings
server command: nexa server MODEL_PATH, start a local server
eval command: nexa eval model_path, used to run model evaluation tasks. You can specify the evaluation task type and the limit on the number of evaluation examples
ollama
cp command: ollama cp llama2 my-llama2, used to copy a model
serve command: ollama serve, used to start the ollama service. It allows models to be accessed through APIs or other means without running the desktop application
Aim
Integrate a user-friendly command-line interface into Powerinfer. By analyzing the command-line interfaces of well-regarded, widely-adopted edge-side inference frameworks (llama.cpp, ollama, and nexa - sdk), propose an interface that is easy for users to operate, reducing their learning costs and enabling them to quickly get the hang of it
1. Running a Model
llama-run [options] model [prompt]
-c, --context-size <value>
: Set the context size. The default is 2048-n, --ngl <value>
: Specify the number of GPU layers. The default is 0--temp <value>
: Set the temperature. The default is 0.8-v, --verbose, --log-verbose
: Set the verbosity level for debugging-h, --help
: Display help informationhuggingface://
(orhf://
),ollama://
,https://
,file://
. If there is no prefix, if the file exists, it defaults tofile://
; if it does not exist, it defaults toollama://
llama-run llama3
,llama-run ollama://granite-code
nexa run MODEL_PATH
(By default, it runs GGUF models. To run ONNX models, usenexa onnx MODEL_PATH
)-t, --temperature TEMPERATURE
: Temperature-m, --max_new_tokens MAX_NEW_TOKENS
: Maximum number of new tokens to generate-k, --top_k TOP_K
: Top-k parameter-p, --top_p TOP_P
: Top-p parameter-sw, --stop_words [STOP_WORDS ...]
: List of stop words to stop generation early--nctx
: Set the context size-pf, --profiling
: Enable performance profiling-st, --streamlit
: Run inference in the Streamlit UI, providing a visual interactive interface-lp, --local_path
: Indicate that the provided model path is a local path-mt, --model_type
: Specify the model running type, options includeNLP
,COMPUTER_VISION
,MULTIMODAL
,AUDIO
. When using, you must specify-lp
or-hf
or-ms
-hf, --huggingface
: Load the model from the Hugging Face Hub-ms, --modelscope
: Load the model from the ModelScope Hub-hf
flag, it is the identifier of the Hugging Face repository; when using the-ms
flag, it is the identifier of the ModelScope modelnexa run llama2
,nexa run sd1-4
ollama run MODEL_NAME
ollama.com/library
. You can also use custom model names (created throughollama create
)ollama run llama2
,ollama run mario
powerinfer run MODEL_NAME
powerinfer create
)2. Downloading a Model
.partial
extension during the download and will be renamed to remove this extension after completionnexa pull MODEL_PATH
-hf, --huggingface
: Pull the model from the Hugging Face Hub-ms, --modelscope
: Pull the model from the ModelScope Hub-o, --output_path OUTPUT_PATH
: Specify a custom output path for the pulled modelnexa pull llama2
ollama pull MODEL_NAME
, used to pull a model and can also be used to update a local model (only pull the different parts)ollama pull llama2
powerinfer pull MODEL_NAME
, with incremental updates-hf, --huggingface
: Pull the model from the Hugging Face Hub-ms, --modelscope
: Pull the model from the ModelScope Hub-o, --output_path OUTPUT_PATH
: Specify a custom output path for the pulled modelpowerinfer pull llama2
3. Listing Local Models
nexa list
ollama list
powerinfer list
4. Deleting a Model
nexa remove MODEL_PATH
nexa remove llama2
ollama rm MODEL_NAME
ollama rm llama2
powerinfer rm MODEL_NAME
powerinfer rm llama2
5. Creating/Converting a Model
nexa convert HF_MODEL_PATH [ftype] [output_file]
nexa-gguf
package. There are many parameters to set the conversion process, such as the number of threads, quantization type, output tensor type, etcnexa convert meta-llama/Llama-3.2-1B-Instruct
ollama create MODEL_NAME -f ./Modelfile
mario
model:powerinfer create MODEL_NAME -f ./Modelfile
6. User Authentication and Information Viewing
nexa login
, log in to the Nexa API to obtain access permissionsnexa whoami
, display information about the currently logged-in usernexa logout
, log out of the Nexa API7. Other Functions
nexa embed MODEL_PATH prompt
, convert text to embeddingsnexa server MODEL_PATH
, start a local servernexa eval model_path
, used to run model evaluation tasks. You can specify the evaluation task type and the limit on the number of evaluation examplesollama cp llama2 my-llama2
, used to copy a modelollama serve
, used to start the ollama service. It allows models to be accessed through APIs or other means without running the desktop applicationReference
The text was updated successfully, but these errors were encountered: