Skip to content

A More User-Friendly Command-Line Interface #238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wisman-tccr opened this issue Jan 25, 2025 · 0 comments
Open

A More User-Friendly Command-Line Interface #238

wisman-tccr opened this issue Jan 25, 2025 · 0 comments

Comments

@wisman-tccr
Copy link

Aim

Integrate a user-friendly command-line interface into Powerinfer. By analyzing the command-line interfaces of well-regarded, widely-adopted edge-side inference frameworks (llama.cpp, ollama, and nexa - sdk), propose an interface that is easy for users to operate, reducing their learning costs and enabling them to quickly get the hang of it

1. Running a Model

  • llama-run
    • Command Format: llama-run [options] model [prompt]
    • Parameter Explanation:
      • -c, --context-size <value>: Set the context size. The default is 2048
      • -n, --ngl <value>: Specify the number of GPU layers. The default is 0
      • --temp <value>: Set the temperature. The default is 0.8
      • -v, --verbose, --log-verbose: Set the verbosity level for debugging
      • -h, --help: Display help information
    • Model Specification Method: The model string can have prefixes such as huggingface:// (or hf://), ollama://, https://, file://. If there is no prefix, if the file exists, it defaults to file://; if it does not exist, it defaults to ollama://
    • Examples: llama-run llama3, llama-run ollama://granite-code
  • nexa-sdk
    • Command Format: nexa run MODEL_PATH (By default, it runs GGUF models. To run ONNX models, use nexa onnx MODEL_PATH)
    • Parameter Explanation (taking text generation models as an example):
      • -t, --temperature TEMPERATURE: Temperature
      • -m, --max_new_tokens MAX_NEW_TOKENS: Maximum number of new tokens to generate
      • -k, --top_k TOP_K: Top-k parameter
      • -p, --top_p TOP_P: Top-p parameter
      • -sw, --stop_words [STOP_WORDS ...]: List of stop words to stop generation early
      • --nctx: Set the context size
      • -pf, --profiling: Enable performance profiling
      • -st, --streamlit: Run inference in the Streamlit UI, providing a visual interactive interface
      • -lp, --local_path: Indicate that the provided model path is a local path
      • -mt, --model_type: Specify the model running type, options include NLP, COMPUTER_VISION, MULTIMODAL, AUDIO. When using, you must specify -lp or -hf or -ms
      • -hf, --huggingface: Load the model from the Hugging Face Hub
      • -ms, --modelscope: Load the model from the ModelScope Hub
    • Model Specification Method: In Nexa, it is the path or identifier of the model; when using the -hf flag, it is the identifier of the Hugging Face repository; when using the -ms flag, it is the identifier of the ModelScope model
    • Examples: nexa run llama2, nexa run sd1-4
  • ollama
    • Command Format: ollama run MODEL_NAME
    • Parameter Explanation: There are no additional running parameters. You can customize the model running parameters by creating a Modelfile, such as setting the temperature and system messages
    • Model Specification Method: Use the model name directly. The model names are listed on ollama.com/library. You can also use custom model names (created through ollama create)
    • Examples: ollama run llama2, ollama run mario
  • powerinfer (Proposed, similar to ollama)
    • Command Format: powerinfer run MODEL_NAME
    • Parameter Explanation: Customize the model running parameters through the model file
    • Model Specification Method: Use the model name directly. You can use officially provided models or custom model names (created through powerinfer create)
    • Support for text-to-image, TTS, etc.: Can be provided through the UI or CLI (by entering a file path).

2. Downloading a Model

  • llama-run: There is no dedicated command to download a model. If the model does not exist when running, it will be automatically downloaded. The file name will have the .partial extension during the download and will be renamed to remove this extension after completion
  • nexa-sdk
    • Command Format: nexa pull MODEL_PATH
    • Parameter Explanation:
      • -hf, --huggingface: Pull the model from the Hugging Face Hub
      • -ms, --modelscope: Pull the model from the ModelScope Hub
      • -o, --output_path OUTPUT_PATH: Specify a custom output path for the pulled model
    • Example: nexa pull llama2
  • ollama
    • Command Format: ollama pull MODEL_NAME, used to pull a model and can also be used to update a local model (only pull the different parts)
    • Parameter Explanation: None
    • Example: ollama pull llama2
  • powerinfer (Proposed, similar to nexa-sdk)
    • Command Format: powerinfer pull MODEL_NAME, with incremental updates
    • Parameter Explanation:
      • -hf, --huggingface: Pull the model from the Hugging Face Hub
      • -ms, --modelscope: Pull the model from the ModelScope Hub
      • -o, --output_path OUTPUT_PATH: Specify a custom output path for the pulled model
    • Example: powerinfer pull llama2

3. Listing Local Models

  • llama-run: There is no dedicated command to list local models
  • nexa-sdk
    • Command Format: nexa list
    • Function: List all local models
  • ollama
    • Command Format: ollama list
    • Function: List all local models
  • powerinfer (Proposed)
    • Command Format: powerinfer list
    • Function: List all local models

4. Deleting a Model

  • llama-run: There is no dedicated command to delete a model
  • nexa-sdk
    • Command Format: nexa remove MODEL_PATH
    • Function: Delete the specified model from the local computer
    • Example: nexa remove llama2
  • ollama
    • Command Format: ollama rm MODEL_NAME
    • Function: Delete the specified model from the local computer
    • Example: ollama rm llama2
  • powerinfer (Proposed)
    • Command Format: powerinfer rm MODEL_NAME
    • Function: Delete the specified model from the local computer
    • Example: powerinfer rm llama2

5. Creating/Converting a Model

  • llama-run: There is no function to create or convert a model
  • nexa-sdk
    • Command Format: nexa convert HF_MODEL_PATH [ftype] [output_file]
    • Parameter Explanation: You need to install the nexa-gguf package. There are many parameters to set the conversion process, such as the number of threads, quantization type, output tensor type, etc
    • Function: Convert and quantize a Hugging Face model to the GGUF format
    • Example: nexa convert meta-llama/Llama-3.2-1B-Instruct
  • ollama
    • Command Format: ollama create MODEL_NAME -f ./Modelfile
    • Function: Create a model according to the Modelfile. It can be used to import GGUF, PyTorch, or Safetensors models and customize model parameters and prompts
    • Example: To create a custom mario model:
      FROM llama2
      PARAMETER temperature 1
      SYSTEM """
      You are Mario from Super Mario Bros. Answer as Mario, the assistant, only
      """
      ollama create mario -f ./Modelfile
      
  • powerinfer (Proposed, similar to ollama)
    • Command Format: powerinfer create MODEL_NAME -f ./Modelfile
    • Function: Develop a model based on the Modelfile

6. User Authentication and Information Viewing

  • llama-run: There is no user authentication and related information viewing function
  • nexa-sdk
    • login command: nexa login, log in to the Nexa API to obtain access permissions
    • whoami command: nexa whoami, display information about the currently logged-in user
    • logout command: nexa logout, log out of the Nexa API
  • ollama: There is no user authentication and related information viewing function
  • powerinfer (Proposed): None

7. Other Functions

  • llama-run: None
  • nexa-sdk
    • embed command: nexa embed MODEL_PATH prompt, convert text to embeddings
    • server command: nexa server MODEL_PATH, start a local server
    • eval command: nexa eval model_path, used to run model evaluation tasks. You can specify the evaluation task type and the limit on the number of evaluation examples
  • ollama
    • cp command: ollama cp llama2 my-llama2, used to copy a model
    • serve command: ollama serve, used to start the ollama service. It allows models to be accessed through APIs or other means without running the desktop application
  • powerinfer (Proposed): None for now

Reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant