Skip to content

Commit def1c18

Browse files
committed
Updates for issues in #1789
1 parent fc5a703 commit def1c18

12 files changed

+27
-25
lines changed

docker_build_script_ubuntu.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,8 @@ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
4141

4242
# if building for CPU, would remove CMAKE_ARGS and avoid GPU image as base image
4343
# Choose llama_cpp_python ARGS for your system according to [llama_cpp_python backend documentation](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#supported-backends), e.g. for CUDA:
44-
export LLAMA_CUBLAS=1
45-
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all"
44+
export GGML_CUDA=1
45+
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all"
4646
# for Metal MAC M1/M2 comment out above two lines and uncomment out the below line
4747
# export CMAKE_ARGS="-DLLAMA_METAL=on"
4848
export FORCE_CMAKE=1

docs/FAQ.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2538,7 +2538,7 @@ on CPU, or for GPU:
25382538
git clone https://github.com/ggerganov/llama.cpp
25392539
cd llama.cpp
25402540
make clean
2541-
make LLAMA_CUBLAS=1
2541+
make GGML_CUDA=1
25422542
```
25432543
etc. following different [scenarios](https://github.com/ggerganov/llama.cpp#build).
25442544
@@ -2928,8 +2928,8 @@ Other workarounds:
29282928
* Workaround 2: Follow normal directions for installation, but replace 0.2.76 with 0.2.26, e.g. for CUDA with Linux:
29292929
```bash
29302930
pip uninstall llama_cpp_python llama_cpp_python_cuda -y
2931-
export LLAMA_CUBLAS=1
2932-
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all"
2931+
export GGML_CUDA=1
2932+
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all"
29332933
export FORCE_CMAKE=1
29342934
pip install llama_cpp_python==0.2.26 --no-cache-dir
29352935
```

docs/README_DOCKER.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ For example, for Metal M1/M2 support of llama.cpp GGUF files, one should change
7474
```bash
7575
export CMAKE_ARGS="-DLLAMA_METAL=on"
7676
```
77-
and remove `LLAMA_CUBLAS=1`, so that the docker image is Metal Compatible for llama.cpp GGUF files. Otherwise, Torch supports Metal M1/M2 directly without changes.
77+
and remove `GGML_CUDA=1`, so that the docker image is Metal Compatible for llama.cpp GGUF files. Otherwise, Torch supports Metal M1/M2 directly without changes.
7878

7979
### Build
8080

docs/README_LINUX.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,8 +110,8 @@ sudo sh cuda_12.1.1_530.30.02_linux.run
110110

111111
* Choose llama_cpp_python ARGS for your system according to [llama_cpp_python backend documentation](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#supported-backends), e.g. for CUDA:
112112
```bash
113-
export LLAMA_CUBLAS=1
114-
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all"
113+
export GGML_CUDA=1
114+
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all"
115115
export FORCE_CMAKE=1
116116
```
117117
Note for some reason things will fail with llama_cpp_python if don't add all cuda arches, and building with all those arches does take some time.

docs/README_WHEEL.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ Install in fresh env, avoiding being inside h2ogpt directory or a directory wher
1313
```bash
1414
export CUDA_HOME=/usr/local/cuda-12.1
1515
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu121 https://huggingface.github.io/autogptq-index/whl/cu121"
16-
set CMAKE_ARGS=-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all
17-
set LLAMA_CUBLAS=1
16+
set CMAKE_ARGS=-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all
17+
set GGML_CUDA=1
1818
set FORCE_CMAKE=1
1919
```
2020
for the cmake args, choose e llama_cpp_python ARGS for your system according to [llama_cpp_python backend documentation](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#supported-backends). Note for some reason things will fail with llama_cpp_python if don't add all cuda arches, and building with all those arches does take some time.
@@ -37,7 +37,7 @@ conda install weasyprint pygobject -c conda-forge -y
3737
```
3838
second run:
3939
```bash
40-
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all"
40+
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all"
4141
export CUDA_HOME=/usr/local/cuda-12.1
4242
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu121 https://huggingface.github.io/autogptq-index/whl/cu121"
4343
pip install h2ogpt==0.2.0[cuda] --index-url https://downloads.h2ogpt.h2o.ai --extra-index-url https://pypi.org/simple --no-cache
@@ -65,7 +65,7 @@ which can be installed with basic CUDA support like:
6565
```bash
6666
# For other GPUs etc. see: https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#supported-backends
6767
# required for PyPi wheels that do not allow URLs, so uses generic llama_cpp_python package:
68-
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all"
68+
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all"
6969
export CUDA_HOME=/usr/local/cuda-12.1
7070
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu121 https://huggingface.github.io/autogptq-index/whl/cu121"
7171
# below [cuda] assumes CUDA 12.1 for some packages like AutoAWQ etc.

docs/README_WINDOWS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,8 @@
6060
```
6161
* For non-CPU case, choose llama_cpp_python ARGS for your system according to [llama_cpp_python backend documentation](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#supported-backends), e.g. for CUDA:
6262
```cmdline
63-
set CMAKE_ARGS=-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all
64-
set LLAMA_CUBLAS=1
63+
set CMAKE_ARGS=-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all
64+
set GGML_CUDA=1
6565
set FORCE_CMAKE=1
6666
```
6767
Note for some reason things will fail with llama_cpp_python if don't add all cuda arches, and building with all those arches does take some time.

docs/README_quickstart.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,15 @@ To quickly try out h2oGPT with limited document Q/A capability, create a fresh P
1717
Then choose your llama_cpp_python options, by changing `CMAKE_ARGS` to whichever system you have according to [llama_cpp_python backend documentation](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#supported-backends).
1818
E.g. CUDA on Linux:
1919
```bash
20-
export LLAMA_CUBLAS=1
21-
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all"
20+
export GGML_CUDA=1
21+
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all"
2222
export FORCE_CMAKE=1
2323
```
2424
Note for some reason things will fail with llama_cpp_python if don't add all cuda arches, and building with all those arches does take some time.
2525
Windows CUDA:
2626
```cmdline
27-
set CMAKE_ARGS=-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all
28-
set LLAMA_CUBLAS=1
27+
set CMAKE_ARGS=-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all
28+
set GGML_CUDA=1
2929
set FORCE_CMAKE=1
3030
```
3131
Note for some reason things will fail with llama_cpp_python if don't add all cuda arches, and building with all those arches does take some time.

docs/linux_install_full.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,8 @@ conda install python=3.10 -c conda-forge -y
5353

5454
export CUDA_HOME=/usr/local/cuda-12.1
5555
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu121"
56-
export LLAMA_CUBLAS=1
57-
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all"
56+
export GGML_CUDA=1
57+
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all"
5858
export FORCE_CMAKE=1
5959

6060
# get patches

reqs_optional/reqs_constraints.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# ensure doesn't drift, e.g. Issue #1348
2-
torch==2.3.1
2+
torch==2.2.1 ;sys_platform != "darwin" and platform_machine != "arm64"
3+
torch==2.3.1; sys_platform == "darwin" and platform_machine == "arm64"
34
gradio==4.26.0
45
gradio_client==0.15.1
56
transformers>=4.43.2
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
gpt4all==1.0.5
22

33
# requires env to be set for specific systems
4-
llama-cpp-python==0.2.85
4+
llama-cpp-python==0.2.87
55

requirements.txt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ huggingface_hub>=0.23.3
1717
appdirs>=1.4.4
1818
fire>=0.5.0
1919
docutils>=0.20.1
20-
torch==2.3.1; sys_platform != "darwin" and platform_machine != "arm64"
20+
torch==2.2.1; sys_platform != "darwin" and platform_machine != "arm64"
2121
torch==2.3.1; sys_platform == "darwin" and platform_machine == "arm64"
2222
evaluate>=0.4.0
2323
rouge_score>=0.1.2
@@ -32,8 +32,9 @@ matplotlib>=3.7.1
3232

3333
# transformers
3434
loralib>=0.1.2
35+
bitsandbytes>=0.43.1; sys_platform != "darwin" and platform_machine != "arm64"
3536
#bitsandbytes downgraded because of Mac M1/M2 support issue. See https://github.com/axolotl-ai-cloud/axolotl/issues/1436
36-
bitsandbytes==0.42.0
37+
bitsandbytes==0.42.0; sys_platform == "darwin" and platform_machine == "arm64"
3738
accelerate>=0.30.1
3839
peft>=0.7.0
3940
transformers>=4.43.2

src/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "7435b4bc4c0e6559fd90e89f7a3f51f9353ccf89"
1+
__version__ = "fc5a7031e4086a2878797fd062547da051b50e0d"

0 commit comments

Comments
 (0)