[Feature] Prevent OOM Crashes in sglang with Large Batches or Image Inputs #6239

yhyang201 · 2025-05-12T15:26:13Z

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

I tried using the OpenAI batches API, but noticed that when the number of requests becomes very large, it's quite easy to run into OOM (out-of-memory) issues, which causes sglang to crash.
I've also seen similar OOM crashes in sglang when using MLLM and sending requests with large images.

Do you think it's necessary to proactively prevent these cases? If so, what would be a good approach to handle them?

from sglang.utils import launch_server_cmd
from sglang.utils import wait_for_server, print_highlight, terminate_process

import json
import time
from openai import OpenAI


server_process, port = launch_server_cmd(
    "python3 -m sglang.launch_server --model-path qwen/qwen2.5-0.5b-instruct --host 0.0.0.0 --mem-fraction-static 0.8 --port 8000" 
)

wait_for_server(f"http://localhost:{port}")
print(f"Server started on http://localhost:{port}")

client = OpenAI(base_url=f"http://127.0.0.1:{port}/v1", api_key="None")

requests = [
    {
        "custom_id": f"request-{i}",
        "method": "POST",
        "url": "/chat/completions",
        "body": {
            "model": "qwen/qwen2.5-0.5b-instruct",
            "messages": [{"role": "user", "content": "What is Python?"}],
            "max_tokens": 50,
        },
    } for i in range(10000)
]

input_file_path = "batch_requests2.jsonl"

with open(input_file_path, "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

with open(input_file_path, "rb") as f:
    file_response = client.files.create(file=f, purpose="batch")

batch_response = client.batches.create(
    input_file_id=file_response.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)

print_highlight(f"Batch job created with ID: {batch_response.id}")

Related resources

No response

The text was updated successfully, but these errors were encountered:

m0g1cian · 2025-05-13T03:25:44Z

Could you try --disable-fast-image-processor and --grammar-backend none? It should completely offload image preprocessing to CPU and reduce VRAM footprint I think

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Prevent OOM Crashes in sglang with Large Batches or Image Inputs #6239

[Feature] Prevent OOM Crashes in sglang with Large Batches or Image Inputs #6239

yhyang201 commented May 12, 2025

m0g1cian commented May 13, 2025

[Feature] Prevent OOM Crashes in sglang with Large Batches or Image Inputs #6239

[Feature] Prevent OOM Crashes in sglang with Large Batches or Image Inputs #6239

Comments

yhyang201 commented May 12, 2025

Checklist

Motivation

Related resources

m0g1cian commented May 13, 2025