[Bug] Cuda error: invalid argument when host init_kv_buffer with argument pin_memory=True

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

In `sglang/srt/mem_cache/memory_pool.py:915-920`
```python
def init_kv_buffer(self):
    return torch.empty(
        (2, self.layer_num, self.size, self.head_num, self.head_dim),
        dtype=self.dtype,
        device=self.device,
        pin_memory=self.pin_memory,
    )
```
when allocate pin memory with size > some threshold, the following error occurs

> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Should sglang provide a server argument `enable_pin_memory` to use pin_memory dynamically?

### Reproduction

using torch 2.5.1 or torch 2.6.0

```python
import torch
t = torch.empty((2, 32, 10000, 8, 128), dtype=torch.bfloat16, device="cpu", pin_memory=True)
```

**related issues**
https://github.com/deepspeedai/DeepSpeed/issues/7150

### Environment

**Python Packages**
sglang==0.4.6
torch==2.5.1 or torch==2.6.0

**System**
Linux kernel: 5.10.112-005.ali5000.al8.x86_64
GPU: H20
Nvidia driver version: 550.144.04
Cuda version: 12.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Cuda error: invalid argument when host init_kv_buffer with argument pin_memory=True #6285

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Cuda error: invalid argument when host init_kv_buffer with argument pin_memory=True #6285

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions