A More User-Friendly Command-Line Interface

## Aim
Integrate a user-friendly command-line interface into Powerinfer. By analyzing the command-line interfaces of well-regarded, widely-adopted edge-side inference frameworks (llama.cpp, ollama, and nexa - sdk), propose an interface that is easy for users to operate, reducing their learning costs and enabling them to quickly get the hang of it
### 1. Running a Model
- **llama-run**
    - **Command Format**: `llama-run [options] model [prompt]`
    - **Parameter Explanation**:
      - `-c, --context-size <value>`: Set the context size. The default is 2048
      - `-n, --ngl <value>`: Specify the number of GPU layers. The default is 0
      - `--temp <value>`: Set the temperature. The default is 0.8
      - `-v, --verbose, --log-verbose`: Set the verbosity level for debugging
      - `-h, --help`: Display help information
    - **Model Specification Method**: The model string can have prefixes such as `huggingface://` (or `hf://`), `ollama://`, `https://`, `file://`. If there is no prefix, if the file exists, it defaults to `file://`; if it does not exist, it defaults to `ollama://`
    - **Examples**: `llama-run llama3`, `llama-run ollama://granite-code`
- **nexa-sdk**
    - **Command Format**: `nexa run MODEL_PATH` (By default, it runs GGUF models. To run ONNX models, use `nexa onnx MODEL_PATH`)
    - **Parameter Explanation (taking text generation models as an example)**:
      - `-t, --temperature TEMPERATURE`: Temperature
      - `-m, --max_new_tokens MAX_NEW_TOKENS`: Maximum number of new tokens to generate
      - `-k, --top_k TOP_K`: Top-k parameter
      - `-p, --top_p TOP_P`: Top-p parameter
      - `-sw, --stop_words [STOP_WORDS ...]`: List of stop words to stop generation early
      - `--nctx`: Set the context size
      - `-pf, --profiling`: Enable performance profiling
      - `-st, --streamlit`: Run inference in the Streamlit UI, providing a visual interactive interface
      - `-lp, --local_path`: Indicate that the provided model path is a local path
      - `-mt, --model_type`: Specify the model running type, options include `NLP`, `COMPUTER_VISION`, `MULTIMODAL`, `AUDIO`. When using, you must specify `-lp` or `-hf` or `-ms`
      - `-hf, --huggingface`: Load the model from the Hugging Face Hub
      - `-ms, --modelscope`: Load the model from the ModelScope Hub
    - **Model Specification Method**: In Nexa, it is the path or identifier of the model; when using the `-hf` flag, it is the identifier of the Hugging Face repository; when using the `-ms` flag, it is the identifier of the ModelScope model
    - **Examples**: `nexa run llama2`, `nexa run sd1-4`
- **ollama**
    - **Command Format**: `ollama run MODEL_NAME`
    - **Parameter Explanation**: There are no additional running parameters. You can customize the model running parameters by creating a Modelfile, such as setting the temperature and system messages
    - **Model Specification Method**: Use the model name directly. The model names are listed on `ollama.com/library`. You can also use custom model names (created through `ollama create`)
    - **Examples**: `ollama run llama2`, `ollama run mario`
- **powerinfer (Proposed, similar to ollama)**
    - **Command Format**: `powerinfer run MODEL_NAME`
    - **Parameter Explanation**: Customize the model running parameters through the model file
    - **Model Specification Method**: Use the model name directly. You can use officially provided models or custom model names (created through `powerinfer create`)
    - **Support for text-to-image, TTS, etc.**: Can be provided through the UI or CLI (by entering a file path).

### 2. Downloading a Model
- **llama-run**: There is no dedicated command to download a model. If the model does not exist when running, it will be automatically downloaded. The file name will have the `.partial` extension during the download and will be renamed to remove this extension after completion
- **nexa-sdk**
    - **Command Format**: `nexa pull MODEL_PATH`
    - **Parameter Explanation**:
      - `-hf, --huggingface`: Pull the model from the Hugging Face Hub
      - `-ms, --modelscope`: Pull the model from the ModelScope Hub
      - `-o, --output_path OUTPUT_PATH`: Specify a custom output path for the pulled model
    - **Example**: `nexa pull llama2`
- **ollama**
    - **Command Format**: `ollama pull MODEL_NAME`, used to pull a model and can also be used to update a local model (only pull the different parts)
    - **Parameter Explanation**: None
    - **Example**: `ollama pull llama2`
- **powerinfer (Proposed, similar to nexa-sdk)**
    - **Command Format**: `powerinfer pull MODEL_NAME`, with incremental updates
    - **Parameter Explanation**:
      - `-hf, --huggingface`: Pull the model from the Hugging Face Hub
      - `-ms, --modelscope`: Pull the model from the ModelScope Hub
      - `-o, --output_path OUTPUT_PATH`: Specify a custom output path for the pulled model
    - **Example**: `powerinfer pull llama2`

### 3. Listing Local Models
- **llama-run**: There is no dedicated command to list local models
- **nexa-sdk**
    - **Command Format**: `nexa list`
    - **Function**: List all local models
- **ollama**
    - **Command Format**: `ollama list`
    - **Function**: List all local models
- **powerinfer (Proposed)**
    - **Command Format**: `powerinfer list`
    - **Function**: List all local models

### 4. Deleting a Model
- **llama-run**: There is no dedicated command to delete a model
- **nexa-sdk**
    - **Command Format**: `nexa remove MODEL_PATH`
    - **Function**: Delete the specified model from the local computer
    - **Example**: `nexa remove llama2`
- **ollama**
    - **Command Format**: `ollama rm MODEL_NAME`
    - **Function**: Delete the specified model from the local computer
    - **Example**: `ollama rm llama2`
- **powerinfer (Proposed)**
    - **Command Format**: `powerinfer rm MODEL_NAME`
    - **Function**: Delete the specified model from the local computer
    - **Example**: `powerinfer rm llama2`

### 5. Creating/Converting a Model
- **llama-run**: There is no function to create or convert a model
- **nexa-sdk**
    - **Command Format**: `nexa convert HF_MODEL_PATH [ftype] [output_file]`
    - **Parameter Explanation**: You need to install the `nexa-gguf` package. There are many parameters to set the conversion process, such as the number of threads, quantization type, output tensor type, etc
    - **Function**: Convert and quantize a Hugging Face model to the GGUF format
    - **Example**: `nexa convert meta-llama/Llama-3.2-1B-Instruct`
- **ollama**
    - **Command Format**: `ollama create MODEL_NAME -f ./Modelfile`
    - **Function**: Create a model according to the Modelfile. It can be used to import GGUF, PyTorch, or Safetensors models and customize model parameters and prompts
    - **Example**: To create a custom `mario` model:
        ```
        FROM llama2
        PARAMETER temperature 1
        SYSTEM """
        You are Mario from Super Mario Bros. Answer as Mario, the assistant, only
        """
        ollama create mario -f ./Modelfile
        ```
- **powerinfer (Proposed, similar to ollama)**
    - **Command Format**: `powerinfer create MODEL_NAME -f ./Modelfile`
    - **Function**: Develop a model based on the Modelfile

### 6. User Authentication and Information Viewing
- **llama-run**: There is no user authentication and related information viewing function
- **nexa-sdk**
    - **login command**: `nexa login`, log in to the Nexa API to obtain access permissions
    - **whoami command**: `nexa whoami`, display information about the currently logged-in user
    - **logout command**: `nexa logout`, log out of the Nexa API
- **ollama**: There is no user authentication and related information viewing function
- **powerinfer (Proposed)**: None

### 7. Other Functions
- **llama-run**: None
- **nexa-sdk**
    - **embed command**: `nexa embed MODEL_PATH prompt`, convert text to embeddings
    - **server command**: `nexa server MODEL_PATH`, start a local server
    - **eval command**: `nexa eval model_path`, used to run model evaluation tasks. You can specify the evaluation task type and the limit on the number of evaluation examples
- **ollama**
    - **cp command**: `ollama cp llama2 my-llama2`, used to copy a model
    - **serve command**: `ollama serve`, used to start the ollama service. It allows models to be accessed through APIs or other means without running the desktop application
- **powerinfer (Proposed)**: None for now

## Reference
- [llama.cpp](https://github.com/ggerganov/llama.cpp)
- [ollama](https://github.com/ollama/ollama)
- [nexa-sdk](https://github.com/NexaAI/nexa-sdk)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A More User-Friendly Command-Line Interface #238

Aim

1. Running a Model

2. Downloading a Model

3. Listing Local Models

4. Deleting a Model

5. Creating/Converting a Model

6. User Authentication and Information Viewing

7. Other Functions

Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A More User-Friendly Command-Line Interface #238

Description

Aim

1. Running a Model

2. Downloading a Model

3. Listing Local Models

4. Deleting a Model

5. Creating/Converting a Model

6. User Authentication and Information Viewing

7. Other Functions

Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions