@@ -4,104 +4,107 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
4
4
5
5
## TurboMind on CUDA Platform
6
6
7
- | Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
8
- | :-------------------: | :------------: | :--: | :-------: | :-----: | :-----: | :---: |
9
- | Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes |
10
- | Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
11
- | Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
12
- | Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
13
- | Llama3.2 | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes |
14
- | InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
15
- | InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
16
- | InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
17
- | InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
18
- | InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
19
- | Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
20
- | Qwen1.5 | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
21
- | Qwen2 | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
22
- | Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
23
- | Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | Yes | Yes |
24
- | Mistral | 7B | LLM | Yes | Yes | Yes | No |
25
- | Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
26
- | DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
27
- | DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
28
- | Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
29
- | DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
30
- | Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
31
- | Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
32
- | Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No |
33
- | YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
34
- | LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
35
- | InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
36
- | InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes |
37
- | ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
38
- | MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
39
- | MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
40
- | MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes |
41
- | GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
42
- | CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
43
- | Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
7
+ | Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
8
+ | :------------------------------: | :--------------: | :--: | :-------: | :-----: | :-----: | :---: |
9
+ | Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes |
10
+ | Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
11
+ | Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
12
+ | Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
13
+ | Llama3.2<sup >\[ 2\] </sup > | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes |
14
+ | InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
15
+ | InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
16
+ | InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
17
+ | InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
18
+ | InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
19
+ | Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
20
+ | Qwen1.5<sup >\[ 1\] </sup > | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
21
+ | Qwen2<sup >\[ 2\] </sup > | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
22
+ | Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
23
+ | Qwen2.5<sup >\[ 2\] </sup > | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
24
+ | Mistral<sup >\[ 1\] </sup > | 7B | LLM | Yes | Yes | Yes | No |
25
+ | Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
26
+ | DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
27
+ | DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
28
+ | Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
29
+ | DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
30
+ | Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
31
+ | Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
32
+ | Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No |
33
+ | YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
34
+ | LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
35
+ | InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
36
+ | InternVL2<sup >\[ 2\] </sup > | 1 - 2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes |
37
+ | InternVL2.5(MPO)<sup >\[ 2\] </sup > | 1 - 78B | MLLM | Yes | Yes\* | Yes\* | Yes |
38
+ | ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
39
+ | MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
40
+ | MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
41
+ | MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes |
42
+ | GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
43
+ | CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
44
+ | Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
44
45
45
46
"-" means not verified yet.
46
47
47
48
``` {note}
48
- * The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.
49
- * When the head_dim of a model is not 128, such as llama3.2-1B, qwen2-0.5B and internvl2-1B, turbomind doesn't support its kv cache 4/8 bit quantization and inference
49
+ * [1] The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.
50
+ * [2] When the head_dim of a model is not 128, such as llama3.2-1B, qwen2-0.5B and internvl2-1B, turbomind doesn't support its kv cache 4/8 bit quantization and inference
50
51
```
51
52
52
53
## PyTorchEngine on CUDA Platform
53
54
54
- | Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 |
55
- | :------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: |
56
- | Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes |
57
- | Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
58
- | Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
59
- | Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
60
- | Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes |
61
- | Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - |
62
- | InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
63
- | InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
64
- | InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
65
- | Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
66
- | Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
67
- | ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |
68
- | Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No |
69
- | YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes |
70
- | Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
71
- | Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No |
72
- | QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes |
73
- | QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
74
- | QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
75
- | QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
76
- | Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
77
- | QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | No |
78
- | DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
79
- | DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
80
- | DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
81
- | MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
82
- | MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
83
- | Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
84
- | Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
85
- | StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
86
- | Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes |
87
- | Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - |
88
- | CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
89
- | CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
90
- | LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | - | - |
91
- | InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
92
- | InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | - | - |
93
- | Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | - | - |
94
- | ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
95
- | Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - |
96
- | GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
97
- | GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | No |
98
- | CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - |
99
- | Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - |
100
- | Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - |
101
- | Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |
55
+ | Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 |
56
+ | :----------------------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: |
57
+ | Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes |
58
+ | Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
59
+ | Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
60
+ | Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
61
+ | Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes |
62
+ | Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - |
63
+ | InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
64
+ | InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
65
+ | InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
66
+ | Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
67
+ | Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
68
+ | ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |
69
+ | Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No |
70
+ | YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes |
71
+ | Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
72
+ | Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No |
73
+ | QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes |
74
+ | QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
75
+ | QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
76
+ | QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
77
+ | Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
78
+ | QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes |
79
+ | DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
80
+ | DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
81
+ | DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
82
+ | MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
83
+ | MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
84
+ | Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
85
+ | Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
86
+ | StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
87
+ | Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes |
88
+ | Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - |
89
+ | CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
90
+ | CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
91
+ | LLaVA(1.5,1.6)<sup >\[ 2\] </sup > | 7B-34B | MLLM | No | No | No | No | No |
92
+ | InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
93
+ | InternVL2 | 1B-76B | MLLM | Yes | Yes | Yes | - | - |
94
+ | InternVL2.5(MPO) | 1B-78B | MLLM | Yes | Yes | Yes | - | - |
95
+ | Mono-InternVL<sup >\[ 1\] </sup > | 2B | MLLM | Yes | Yes | Yes | - | - |
96
+ | ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
97
+ | Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - |
98
+ | GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
99
+ | GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | Yes |
100
+ | CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - |
101
+ | Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - |
102
+ | Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - |
103
+ | Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |
102
104
103
105
``` {note}
104
- * Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
106
+ * [1] Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
107
+ * [2] PyTorch engine removes the support of original llava models after v0.6.4. Please use their corresponding transformers models instead, which can be found in https://huggingface.co/llava-hf
105
108
```
106
109
107
110
## PyTorchEngine on Huawei Ascend Platform
0 commit comments