Skip to content

Commit 4cb3854

Browse files
authored
bump version to v0.5.0 (#1852)
* bump version to v0.5.0 * update news * update news * update supported models * update * fix lint * set LMDEPLOY_VERSION 0.5.0
1 parent 5ceb464 commit 4cb3854

File tree

10 files changed

+75
-64
lines changed

10 files changed

+75
-64
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ ______________________________________________________________________
2626
<details open>
2727
<summary><b>2024</b></summary>
2828

29+
- \[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next
2930
- \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
3031
- \[2024/05\] Support 4-bits weight-only quantization and inference on VMLs, such as InternVL v1.5, LLaVa, InternLMXComposer2
3132
- \[2024/04\] Support Llama3 and more VLMs, such as InternVL v1.1, v1.2, MiniGemini, InternLMXComposer2.
@@ -112,6 +113,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
112113
<li>QWen (1.8B - 72B)</li>
113114
<li>QWen1.5 (0.5B - 110B)</li>
114115
<li>QWen1.5 - MoE (0.5B - 72B)</li>
116+
<li>QWen2 (0.5B - 72B)</li>
115117
<li>Baichuan (7B)</li>
116118
<li>Baichuan2 (7B-13B)</li>
117119
<li>Code Llama (7B - 34B)</li>
@@ -121,6 +123,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
121123
<li>YI (6B-34B)</li>
122124
<li>Mistral (7B)</li>
123125
<li>DeepSeek-MoE (16B)</li>
126+
<li>DeepSeek-V2 (16B, 236B)</li>
124127
<li>Mixtral (8x7B, 8x22B)</li>
125128
<li>Gemma (2B - 7B)</li>
126129
<li>Dbrx (132B)</li>
@@ -162,7 +165,7 @@ pip install lmdeploy
162165
Since v0.3.0, The default prebuilt package is compiled on **CUDA 12**. However, if CUDA 11+ is required, you can install lmdeploy by:
163166

164167
```shell
165-
export LMDEPLOY_VERSION=0.3.0
168+
export LMDEPLOY_VERSION=0.5.0
166169
export PYTHON_VERSION=38
167170
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
168171
```

README_zh-CN.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ ______________________________________________________________________
2626
<details open>
2727
<summary><b>2024</b></summary>
2828

29+
- \[2024/06\] PyTorch engine 支持了 DeepSeek-V2 和若干 VLM 模型推理, 比如 CogVLM2,Mini-InternVL,LlaVA-Next
2930
- \[2024/05\] 在多 GPU 上部署 VLM 模型时,支持把视觉部分的模型均分到多卡上
3031
- \[2024/05\] 支持InternVL v1.5, LLaVa, InternLMXComposer2 等 VLMs 模型的 4bit 权重量化和推理
3132
- \[2024/04\] 支持 Llama3 和 InternVL v1.1, v1.2,MiniGemini,InternLM-XComposer2 等 VLM 模型
@@ -113,6 +114,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
113114
<li>QWen (1.8B - 72B)</li>
114115
<li>QWen1.5 (0.5B - 110B)</li>
115116
<li>QWen1.5 - MoE (0.5B - 72B)</li>
117+
<li>QWen2 (0.5B - 72B)</li>
116118
<li>Baichuan (7B)</li>
117119
<li>Baichuan2 (7B-13B)</li>
118120
<li>Code Llama (7B - 34B)</li>
@@ -122,6 +124,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
122124
<li>YI (6B-34B)</li>
123125
<li>Mistral (7B)</li>
124126
<li>DeepSeek-MoE (16B)</li>
127+
<li>DeepSeek-V2 (16B, 236B)</li>
125128
<li>Mixtral (8x7B, 8x22B)</li>
126129
<li>Gemma (2B - 7B)</li>
127130
<li>Dbrx (132B)</li>
@@ -163,7 +166,7 @@ pip install lmdeploy
163166
自 v0.3.0 起,LMDeploy 预编译包默认基于 CUDA 12 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,请执行以下命令:
164167

165168
```shell
166-
export LMDEPLOY_VERSION=0.3.0
169+
export LMDEPLOY_VERSION=0.5.0
167170
export PYTHON_VERSION=38
168171
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
169172
```

docs/en/get_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ pip install lmdeploy
1313
The default prebuilt package is compiled on **CUDA 12**. However, if CUDA 11+ is required, you can install lmdeploy by:
1414

1515
```shell
16-
export LMDEPLOY_VERSION=0.4.2
16+
export LMDEPLOY_VERSION=0.5.0
1717
export PYTHON_VERSION=38
1818
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
1919
```

docs/en/multi_modal/cogvlm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Install LMDeploy with pip (Python 3.8+). Refer to [Installation](https://lmdeplo
2222
```shell
2323
# cuda 11.8
2424
# to get the latest version, run: pip index versions lmdeploy
25-
export LMDEPLOY_VERSION=0.4.2
25+
export LMDEPLOY_VERSION=0.5.0
2626
export PYTHON_VERSION=38
2727
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
2828
# cuda 12.1

docs/en/supported_models/supported_models.md

Lines changed: 30 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@
1313
| InternLM-XComposer | 7B | Yes | Yes | Yes | Yes |
1414
| InternLM-XComposer2 | 7B, 4khd-7B | Yes | Yes | Yes | Yes |
1515
| QWen | 1.8B - 72B | Yes | Yes | Yes | Yes |
16-
| QWen1.5 | 1.8B - 72B | Yes | Yes | Yes | Yes |
16+
| QWen1.5 | 1.8B - 110B | Yes | Yes | Yes | Yes |
17+
| QWen2 | 1.5B - 72B | Yes | Yes | Yes | Yes |
1718
| Mistral | 7B | Yes | Yes | Yes | No |
1819
| QWen-VL | 7B | Yes | Yes | Yes | Yes |
1920
| DeepSeek-VL | 7B | Yes | Yes | Yes | Yes |
@@ -35,29 +36,31 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
3536

3637
## Models supported by PyTorch
3738

38-
| Model | Size | FP16/BF16 | KV INT8 | W8A8 |
39-
| :-----------------: | :--------: | :-------: | :-----: | :--: |
40-
| Llama | 7B - 65B | Yes | No | Yes |
41-
| Llama2 | 7B - 70B | Yes | No | Yes |
42-
| Llama3 | 8B, 70B | Yes | No | Yes |
43-
| InternLM | 7B - 20B | Yes | No | Yes |
44-
| InternLM2 | 7B - 20B | Yes | No | - |
45-
| InternLM2.5 | 7B | Yes | No | - |
46-
| Baichuan2 | 7B - 13B | Yes | No | Yes |
47-
| ChatGLM2 | 6B | Yes | No | No |
48-
| Falcon | 7B - 180B | Yes | No | No |
49-
| YI | 6B - 34B | Yes | No | No |
50-
| Mistral | 7B | Yes | No | No |
51-
| Mixtral | 8x7B | Yes | No | No |
52-
| QWen | 1.8B - 72B | Yes | No | No |
53-
| QWen1.5 | 0.5B - 72B | Yes | No | No |
54-
| QWen1.5-MoE | A2.7B | Yes | No | No |
55-
| DeepSeek-MoE | 16B | Yes | No | No |
56-
| Gemma | 2B-7B | Yes | No | No |
57-
| Dbrx | 132B | Yes | No | No |
58-
| StarCoder2 | 3B-15B | Yes | No | No |
59-
| Phi-3-mini | 3.8B | Yes | No | No |
60-
| CogVLM-Chat | 17B | Yes | No | No |
61-
| CogVLM2-Chat | 19B | Yes | No | No |
62-
| LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
63-
| InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |
39+
| Model | Size | FP16/BF16 | KV INT8 | W8A8 |
40+
| :-----------------: | :---------: | :-------: | :-----: | :--: |
41+
| Llama | 7B - 65B | Yes | No | Yes |
42+
| Llama2 | 7B - 70B | Yes | No | Yes |
43+
| Llama3 | 8B, 70B | Yes | No | Yes |
44+
| InternLM | 7B - 20B | Yes | No | Yes |
45+
| InternLM2 | 7B - 20B | Yes | No | - |
46+
| InternLM2.5 | 7B | Yes | No | - |
47+
| Baichuan2 | 7B - 13B | Yes | No | Yes |
48+
| ChatGLM2 | 6B | Yes | No | No |
49+
| Falcon | 7B - 180B | Yes | No | No |
50+
| YI | 6B - 34B | Yes | No | No |
51+
| Mistral | 7B | Yes | No | No |
52+
| Mixtral | 8x7B | Yes | No | No |
53+
| QWen | 1.8B - 72B | Yes | No | No |
54+
| QWen1.5 | 0.5B - 110B | Yes | No | No |
55+
| QWen1.5-MoE | A2.7B | Yes | No | No |
56+
| QWen2 | 0.5B - 72B | Yes | No | No |
57+
| DeepSeek-MoE | 16B | Yes | No | No |
58+
| DeepSeek-V2 | 16B, 236B | Yes | No | No |
59+
| Gemma | 2B-7B | Yes | No | No |
60+
| Dbrx | 132B | Yes | No | No |
61+
| StarCoder2 | 3B-15B | Yes | No | No |
62+
| Phi-3-mini | 3.8B | Yes | No | No |
63+
| CogVLM-Chat | 17B | Yes | No | No |
64+
| CogVLM2-Chat | 19B | Yes | No | No |
65+
| LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
66+
| InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |

docs/zh_cn/get_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ pip install lmdeploy
1313
LMDeploy的预编译包默认是基于 CUDA 12 编译的。如果需要在 CUDA 11+ 下安装 LMDeploy,请执行以下命令:
1414

1515
```shell
16-
export LMDEPLOY_VERSION=0.4.2
16+
export LMDEPLOY_VERSION=0.5.0
1717
export PYTHON_VERSION=38
1818
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
1919
```

docs/zh_cn/multi_modal/cogvlm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ pip install torch==2.2.2 torchvision==0.17.2 xformers==0.0.26 --index-url https:
2121

2222
```shell
2323
# cuda 11.8
24-
export LMDEPLOY_VERSION=0.4.2
24+
export LMDEPLOY_VERSION=0.5.0
2525
export PYTHON_VERSION=38
2626
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
2727
# cuda 12.1

docs/zh_cn/supported_models/supported_models.md

Lines changed: 30 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@
1313
| InternLM-XComposer | 7B | Yes | Yes | Yes | Yes |
1414
| InternLM-XComposer2 | 7B, 4khd-7B | Yes | Yes | Yes | Yes |
1515
| QWen | 1.8B - 72B | Yes | Yes | Yes | Yes |
16-
| QWen1.5 | 1.8B - 72B | Yes | Yes | Yes | Yes |
16+
| QWen1.5 | 1.8B - 110B | Yes | Yes | Yes | Yes |
17+
| QWen2 | 1.5B - 72B | Yes | Yes | Yes | Yes |
1718
| Mistral | 7B | Yes | Yes | Yes | No |
1819
| QWen-VL | 7B | Yes | Yes | Yes | Yes |
1920
| DeepSeek-VL | 7B | Yes | Yes | Yes | Yes |
@@ -35,29 +36,31 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
3536

3637
### PyTorch 支持的模型
3738

38-
| 模型 | 模型规模 | FP16/BF16 | KV INT8 | W8A8 |
39-
| :-----------------: | :--------: | :-------: | :-----: | :--: |
40-
| Llama | 7B - 65B | Yes | No | Yes |
41-
| Llama2 | 7B - 70B | Yes | No | Yes |
42-
| Llama3 | 8B, 70B | Yes | No | Yes |
43-
| InternLM | 7B - 20B | Yes | No | Yes |
44-
| InternLM2 | 7B - 20B | Yes | No | - |
45-
| InternLM2.5 | 7B | Yes | No | - |
46-
| Baichuan2 | 7B - 13B | Yes | No | Yes |
47-
| ChatGLM2 | 6B | Yes | No | No |
48-
| Falcon | 7B - 180B | Yes | No | No |
49-
| YI | 6B - 34B | Yes | No | No |
50-
| Mistral | 7B | Yes | No | No |
51-
| Mixtral | 8x7B | Yes | No | No |
52-
| QWen | 1.8B - 72B | Yes | No | No |
53-
| QWen1.5 | 0.5B - 72B | Yes | No | No |
54-
| QWen1.5-MoE | A2.7B | Yes | No | No |
55-
| DeepSeek-MoE | 16B | Yes | No | No |
56-
| Gemma | 2B-7B | Yes | No | No |
57-
| Dbrx | 132B | Yes | No | No |
58-
| StarCoder2 | 3B-15B | Yes | No | No |
59-
| Phi-3-mini | 3.8B | Yes | No | No |
60-
| CogVLM-Chat | 17B | Yes | No | No |
61-
| CogVLM2-Chat | 19B | Yes | No | No |
62-
| LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
63-
| InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |
39+
| 模型 | 模型规模 | FP16/BF16 | KV INT8 | W8A8 |
40+
| :-----------------: | :---------: | :-------: | :-----: | :--: |
41+
| Llama | 7B - 65B | Yes | No | Yes |
42+
| Llama2 | 7B - 70B | Yes | No | Yes |
43+
| Llama3 | 8B, 70B | Yes | No | Yes |
44+
| InternLM | 7B - 20B | Yes | No | Yes |
45+
| InternLM2 | 7B - 20B | Yes | No | - |
46+
| InternLM2.5 | 7B | Yes | No | - |
47+
| Baichuan2 | 7B - 13B | Yes | No | Yes |
48+
| ChatGLM2 | 6B | Yes | No | No |
49+
| Falcon | 7B - 180B | Yes | No | No |
50+
| YI | 6B - 34B | Yes | No | No |
51+
| Mistral | 7B | Yes | No | No |
52+
| Mixtral | 8x7B | Yes | No | No |
53+
| QWen | 1.8B - 72B | Yes | No | No |
54+
| QWen1.5 | 0.5B - 110B | Yes | No | No |
55+
| QWen2 | 0.5B - 72B | Yes | No | No |
56+
| QWen1.5-MoE | A2.7B | Yes | No | No |
57+
| DeepSeek-MoE | 16B | Yes | No | No |
58+
| DeepSeek-V2 | 16B, 236B | Yes | No | No |
59+
| Gemma | 2B-7B | Yes | No | No |
60+
| Dbrx | 132B | Yes | No | No |
61+
| StarCoder2 | 3B-15B | Yes | No | No |
62+
| Phi-3-mini | 3.8B | Yes | No | No |
63+
| CogVLM-Chat | 17B | Yes | No | No |
64+
| CogVLM2-Chat | 19B | Yes | No | No |
65+
| LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
66+
| InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |

lmdeploy/cli/utils.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -379,9 +379,8 @@ def cache_max_entry_count(parser):
379379
'--cache-max-entry-count',
380380
type=float,
381381
default=0.8,
382-
help=
383-
'The percentage of free gpu memory occupied by the k/v cache, excluding weights'
384-
)
382+
help='The percentage of free gpu memory occupied by the k/v '
383+
'cache, excluding weights ')
385384

386385
@staticmethod
387386
def adapters(parser):

lmdeploy/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Copyright (c) OpenMMLab. All rights reserved.
22
from typing import Tuple
33

4-
__version__ = '0.4.2'
4+
__version__ = '0.5.0'
55
short_version = __version__
66

77

0 commit comments

Comments
 (0)