Open
Description
Aim
Integrate a user-friendly command-line interface into Powerinfer. By analyzing the command-line interfaces of well-regarded, widely-adopted edge-side inference frameworks (llama.cpp, ollama, and nexa - sdk), propose an interface that is easy for users to operate, reducing their learning costs and enabling them to quickly get the hang of it
1. Running a Model
- llama-run
- Command Format:
llama-run [options] model [prompt]
- Parameter Explanation:
-c, --context-size <value>
: Set the context size. The default is 2048-n, --ngl <value>
: Specify the number of GPU layers. The default is 0--temp <value>
: Set the temperature. The default is 0.8-v, --verbose, --log-verbose
: Set the verbosity level for debugging-h, --help
: Display help information
- Model Specification Method: The model string can have prefixes such as
huggingface://
(orhf://
),ollama://
,https://
,file://
. If there is no prefix, if the file exists, it defaults tofile://
; if it does not exist, it defaults toollama://
- Examples:
llama-run llama3
,llama-run ollama://granite-code
- Command Format:
- nexa-sdk
- Command Format:
nexa run MODEL_PATH
(By default, it runs GGUF models. To run ONNX models, usenexa onnx MODEL_PATH
) - Parameter Explanation (taking text generation models as an example):
-t, --temperature TEMPERATURE
: Temperature-m, --max_new_tokens MAX_NEW_TOKENS
: Maximum number of new tokens to generate-k, --top_k TOP_K
: Top-k parameter-p, --top_p TOP_P
: Top-p parameter-sw, --stop_words [STOP_WORDS ...]
: List of stop words to stop generation early--nctx
: Set the context size-pf, --profiling
: Enable performance profiling-st, --streamlit
: Run inference in the Streamlit UI, providing a visual interactive interface-lp, --local_path
: Indicate that the provided model path is a local path-mt, --model_type
: Specify the model running type, options includeNLP
,COMPUTER_VISION
,MULTIMODAL
,AUDIO
. When using, you must specify-lp
or-hf
or-ms
-hf, --huggingface
: Load the model from the Hugging Face Hub-ms, --modelscope
: Load the model from the ModelScope Hub
- Model Specification Method: In Nexa, it is the path or identifier of the model; when using the
-hf
flag, it is the identifier of the Hugging Face repository; when using the-ms
flag, it is the identifier of the ModelScope model - Examples:
nexa run llama2
,nexa run sd1-4
- Command Format:
- ollama
- Command Format:
ollama run MODEL_NAME
- Parameter Explanation: There are no additional running parameters. You can customize the model running parameters by creating a Modelfile, such as setting the temperature and system messages
- Model Specification Method: Use the model name directly. The model names are listed on
ollama.com/library
. You can also use custom model names (created throughollama create
) - Examples:
ollama run llama2
,ollama run mario
- Command Format:
- powerinfer (Proposed, similar to ollama)
- Command Format:
powerinfer run MODEL_NAME
- Parameter Explanation: Customize the model running parameters through the model file
- Model Specification Method: Use the model name directly. You can use officially provided models or custom model names (created through
powerinfer create
) - Support for text-to-image, TTS, etc.: Can be provided through the UI or CLI (by entering a file path).
- Command Format:
2. Downloading a Model
- llama-run: There is no dedicated command to download a model. If the model does not exist when running, it will be automatically downloaded. The file name will have the
.partial
extension during the download and will be renamed to remove this extension after completion - nexa-sdk
- Command Format:
nexa pull MODEL_PATH
- Parameter Explanation:
-hf, --huggingface
: Pull the model from the Hugging Face Hub-ms, --modelscope
: Pull the model from the ModelScope Hub-o, --output_path OUTPUT_PATH
: Specify a custom output path for the pulled model
- Example:
nexa pull llama2
- Command Format:
- ollama
- Command Format:
ollama pull MODEL_NAME
, used to pull a model and can also be used to update a local model (only pull the different parts) - Parameter Explanation: None
- Example:
ollama pull llama2
- Command Format:
- powerinfer (Proposed, similar to nexa-sdk)
- Command Format:
powerinfer pull MODEL_NAME
, with incremental updates - Parameter Explanation:
-hf, --huggingface
: Pull the model from the Hugging Face Hub-ms, --modelscope
: Pull the model from the ModelScope Hub-o, --output_path OUTPUT_PATH
: Specify a custom output path for the pulled model
- Example:
powerinfer pull llama2
- Command Format:
3. Listing Local Models
- llama-run: There is no dedicated command to list local models
- nexa-sdk
- Command Format:
nexa list
- Function: List all local models
- Command Format:
- ollama
- Command Format:
ollama list
- Function: List all local models
- Command Format:
- powerinfer (Proposed)
- Command Format:
powerinfer list
- Function: List all local models
- Command Format:
4. Deleting a Model
- llama-run: There is no dedicated command to delete a model
- nexa-sdk
- Command Format:
nexa remove MODEL_PATH
- Function: Delete the specified model from the local computer
- Example:
nexa remove llama2
- Command Format:
- ollama
- Command Format:
ollama rm MODEL_NAME
- Function: Delete the specified model from the local computer
- Example:
ollama rm llama2
- Command Format:
- powerinfer (Proposed)
- Command Format:
powerinfer rm MODEL_NAME
- Function: Delete the specified model from the local computer
- Example:
powerinfer rm llama2
- Command Format:
5. Creating/Converting a Model
- llama-run: There is no function to create or convert a model
- nexa-sdk
- Command Format:
nexa convert HF_MODEL_PATH [ftype] [output_file]
- Parameter Explanation: You need to install the
nexa-gguf
package. There are many parameters to set the conversion process, such as the number of threads, quantization type, output tensor type, etc - Function: Convert and quantize a Hugging Face model to the GGUF format
- Example:
nexa convert meta-llama/Llama-3.2-1B-Instruct
- Command Format:
- ollama
- Command Format:
ollama create MODEL_NAME -f ./Modelfile
- Function: Create a model according to the Modelfile. It can be used to import GGUF, PyTorch, or Safetensors models and customize model parameters and prompts
- Example: To create a custom
mario
model:FROM llama2 PARAMETER temperature 1 SYSTEM """ You are Mario from Super Mario Bros. Answer as Mario, the assistant, only """ ollama create mario -f ./Modelfile
- Command Format:
- powerinfer (Proposed, similar to ollama)
- Command Format:
powerinfer create MODEL_NAME -f ./Modelfile
- Function: Develop a model based on the Modelfile
- Command Format:
6. User Authentication and Information Viewing
- llama-run: There is no user authentication and related information viewing function
- nexa-sdk
- login command:
nexa login
, log in to the Nexa API to obtain access permissions - whoami command:
nexa whoami
, display information about the currently logged-in user - logout command:
nexa logout
, log out of the Nexa API
- login command:
- ollama: There is no user authentication and related information viewing function
- powerinfer (Proposed): None
7. Other Functions
- llama-run: None
- nexa-sdk
- embed command:
nexa embed MODEL_PATH prompt
, convert text to embeddings - server command:
nexa server MODEL_PATH
, start a local server - eval command:
nexa eval model_path
, used to run model evaluation tasks. You can specify the evaluation task type and the limit on the number of evaluation examples
- embed command:
- ollama
- cp command:
ollama cp llama2 my-llama2
, used to copy a model - serve command:
ollama serve
, used to start the ollama service. It allows models to be accessed through APIs or other means without running the desktop application
- cp command:
- powerinfer (Proposed): None for now
Reference
Metadata
Metadata
Assignees
Labels
No labels