Skip to content

Commit af0fcf2

Browse files
authored
Bump version to v0.6.5 (#2955)
* bump version to v0.6.5 * update supported models * update supported models
1 parent b7a56c7 commit af0fcf2

File tree

5 files changed

+185
-179
lines changed

5 files changed

+185
-179
lines changed

docs/en/get_started/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ pip install lmdeploy
2323
The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:
2424

2525
```shell
26-
export LMDEPLOY_VERSION=0.6.4
26+
export LMDEPLOY_VERSION=0.6.5
2727
export PYTHON_VERSION=38
2828
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
2929
```

docs/en/supported_models/supported_models.md

Lines changed: 91 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -4,104 +4,107 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
44

55
## TurboMind on CUDA Platform
66

7-
| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
8-
| :-------------------: | :------------: | :--: | :-------: | :-----: | :-----: | :---: |
9-
| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes |
10-
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
11-
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
12-
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
13-
| Llama3.2 | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes |
14-
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
15-
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
16-
| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
17-
| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
18-
| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
19-
| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
20-
| Qwen1.5 | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
21-
| Qwen2 | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
22-
| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
23-
| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | Yes | Yes |
24-
| Mistral | 7B | LLM | Yes | Yes | Yes | No |
25-
| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
26-
| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
27-
| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
28-
| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
29-
| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
30-
| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
31-
| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
32-
| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No |
33-
| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
34-
| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
35-
| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
36-
| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes |
37-
| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
38-
| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
39-
| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
40-
| MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes |
41-
| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
42-
| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
43-
| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
7+
| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
8+
| :------------------------------: | :--------------: | :--: | :-------: | :-----: | :-----: | :---: |
9+
| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes |
10+
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
11+
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
12+
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
13+
| Llama3.2<sup>\[2\]</sup> | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes |
14+
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
15+
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
16+
| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
17+
| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
18+
| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
19+
| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
20+
| Qwen1.5<sup>\[1\]</sup> | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
21+
| Qwen2<sup>\[2\]</sup> | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
22+
| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
23+
| Qwen2.5<sup>\[2\]</sup> | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
24+
| Mistral<sup>\[1\]</sup> | 7B | LLM | Yes | Yes | Yes | No |
25+
| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
26+
| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
27+
| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
28+
| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
29+
| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
30+
| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
31+
| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
32+
| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No |
33+
| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
34+
| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
35+
| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
36+
| InternVL2<sup>\[2\]</sup> | 1 - 2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes |
37+
| InternVL2.5(MPO)<sup>\[2\]</sup> | 1 - 78B | MLLM | Yes | Yes\* | Yes\* | Yes |
38+
| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
39+
| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
40+
| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
41+
| MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes |
42+
| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
43+
| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
44+
| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
4445

4546
"-" means not verified yet.
4647

4748
```{note}
48-
* The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.
49-
* When the head_dim of a model is not 128, such as llama3.2-1B, qwen2-0.5B and internvl2-1B, turbomind doesn't support its kv cache 4/8 bit quantization and inference
49+
* [1] The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.
50+
* [2] When the head_dim of a model is not 128, such as llama3.2-1B, qwen2-0.5B and internvl2-1B, turbomind doesn't support its kv cache 4/8 bit quantization and inference
5051
```
5152

5253
## PyTorchEngine on CUDA Platform
5354

54-
| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 |
55-
| :------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: |
56-
| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes |
57-
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
58-
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
59-
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
60-
| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes |
61-
| Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - |
62-
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
63-
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
64-
| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
65-
| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
66-
| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
67-
| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |
68-
| Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No |
69-
| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes |
70-
| Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
71-
| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No |
72-
| QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes |
73-
| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
74-
| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
75-
| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
76-
| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
77-
| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | No |
78-
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
79-
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
80-
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
81-
| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
82-
| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
83-
| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
84-
| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
85-
| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
86-
| Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes |
87-
| Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - |
88-
| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
89-
| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
90-
| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | - | - |
91-
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
92-
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | - | - |
93-
| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | - | - |
94-
| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
95-
| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - |
96-
| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
97-
| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | No |
98-
| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - |
99-
| Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - |
100-
| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - |
101-
| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |
55+
| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 |
56+
| :----------------------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: |
57+
| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes |
58+
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
59+
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
60+
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
61+
| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes |
62+
| Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - |
63+
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
64+
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
65+
| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
66+
| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
67+
| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
68+
| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |
69+
| Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No |
70+
| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes |
71+
| Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
72+
| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No |
73+
| QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes |
74+
| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
75+
| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
76+
| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
77+
| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
78+
| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes |
79+
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
80+
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
81+
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
82+
| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
83+
| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
84+
| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
85+
| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
86+
| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
87+
| Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes |
88+
| Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - |
89+
| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
90+
| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
91+
| LLaVA(1.5,1.6)<sup>\[2\]</sup> | 7B-34B | MLLM | No | No | No | No | No |
92+
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
93+
| InternVL2 | 1B-76B | MLLM | Yes | Yes | Yes | - | - |
94+
| InternVL2.5(MPO) | 1B-78B | MLLM | Yes | Yes | Yes | - | - |
95+
| Mono-InternVL<sup>\[1\]</sup> | 2B | MLLM | Yes | Yes | Yes | - | - |
96+
| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
97+
| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - |
98+
| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
99+
| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | Yes |
100+
| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - |
101+
| Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - |
102+
| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - |
103+
| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |
102104

103105
```{note}
104-
* Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
106+
* [1] Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
107+
* [2] PyTorch engine removes the support of original llava models after v0.6.4. Please use their corresponding transformers models instead, which can be found in https://huggingface.co/llava-hf
105108
```
106109

107110
## PyTorchEngine on Huawei Ascend Platform

docs/zh_cn/get_started/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ pip install lmdeploy
2323
默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3),你可以使用以下命令安装 lmdeploy:
2424

2525
```shell
26-
export LMDEPLOY_VERSION=0.6.4
26+
export LMDEPLOY_VERSION=0.6.5
2727
export PYTHON_VERSION=38
2828
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
2929
```

0 commit comments

Comments
 (0)