feat: enable prefill cudagraph by default by sufubao · Pull Request #1352 · ModelTC/LightLLM

sufubao · 2026-06-15T07:10:23Z

Make prefill cudagraph the default. Replaces --enable_prefill_cudagraph with opt-out --disable_prefill_cudagraph; auto-skips ep moe and dp prefill balance.

…graph

gemini-code-assist

Code Review

This pull request changes the prefill CUDA Graph feature to be enabled by default, replacing the --enable_prefill_cudagraph option with --disable_prefill_cudagraph and updating the relevant models, CLI arguments, documentation, and tests. Feedback on these changes highlights two main issues: first, enabling this feature by default may cause initialization failures or crashes on unsupported models (such as ChatGLM or Baichuan), so a check should be added to restrict it to supported models; second, the test script changes make the baseline and test runs identical, so --disable_prefill_cudagraph should be explicitly added to the baseline run to maintain the distinction.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-15T07:11:59Z

+        args = get_env_start_args()
+        enable_prefill_cudagraph = not args.disable_prefill_cudagraph
+        # prefill cudagraph currently only supports llama/qwen-style models. It is incompatible with
+        # ep moe and dp prefill balance, so auto-skip those configs even when not explicitly disabled.
+        if enable_prefill_cudagraph and args.enable_ep_moe:
+            logger.warning("prefill cudagraph does not support ep moe, disabling it.")
+            enable_prefill_cudagraph = False
+        if enable_prefill_cudagraph and args.enable_dp_prefill_balance:
+            logger.warning("prefill cudagraph does not support dp prefill balance, disabling it.")
+            enable_prefill_cudagraph = False
+
        self.prefill_graph = (
-            None
-            if not get_env_start_args().enable_prefill_cudagraph
-            else PrefillCudaGraph(decode_cuda_graph=self.graph, tp_world_size=self.tp_world_size_)
+            PrefillCudaGraph(decode_cuda_graph=self.graph, tp_world_size=self.tp_world_size_)
+            if enable_prefill_cudagraph
+            else None
        )
        if self.prefill_graph is not None:
            if get_env_start_args().enable_prefill_microbatch_overlap:


Since prefill CUDA Graph is now enabled by default, it will attempt to initialize and run warmup for all models. However, as noted in the comments, prefill CUDA Graph currently only supports llama/qwen-style models. Running it on unsupported models (e.g., ChatGLM, Baichuan) will lead to initialization failures or runtime crashes during warmup.

We should add a check to restrict the default enablement to supported models (such as llama, qwen, yi, mistral). Additionally, we can use getattr for safer attribute access on the args object and reuse the args variable instead of calling get_env_start_args() multiple times.

args = get_env_start_args() enable_prefill_cudagraph = not getattr(args, "disable_prefill_cudagraph", False) # prefill cudagraph currently only supports llama/qwen-style models. model_class_name = self.__class__.__name__.lower() is_supported_model = any(m in model_class_name for m in ["llama", "qwen", "yi", "mistral"]) if enable_prefill_cudagraph and not is_supported_model: enable_prefill_cudagraph = False # It is incompatible with ep moe and dp prefill balance, so auto-skip those configs even when not explicitly disabled. if enable_prefill_cudagraph and getattr(args, "enable_ep_moe", False): logger.warning("prefill cudagraph does not support ep moe, disabling it.") enable_prefill_cudagraph = False if enable_prefill_cudagraph and getattr(args, "enable_dp_prefill_balance", False): logger.warning("prefill cudagraph does not support dp prefill balance, disabling it.") enable_prefill_cudagraph = False self.prefill_graph = ( PrefillCudaGraph(decode_cuda_graph=self.graph, tp_world_size=self.tp_world_size_) if enable_prefill_cudagraph else None ) if self.prefill_graph is not None: if getattr(args, "enable_prefill_microbatch_overlap", False):

gemini-code-assist · 2026-06-15T07:11:59Z

 LOADWORKER=18 CUDA_VISIBLE_DEVICES=6,7 python -m lightllm.server.api_server \
 --model_dir /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/2fc06364715b967f1860aea9cf38778875588b17 \
 --tp 2 \
--port 8089 \
--enable_prefill_cudagraph
+--port 8089


By removing --enable_prefill_cudagraph from this second test run, it becomes completely identical to the first test run (which also has no prefill cudagraph flags and thus now runs with prefill cudagraph enabled by default).

To preserve the distinction between the baseline test (without prefill cudagraph) and the prefill cudagraph test, we should add --disable_prefill_cudagraph to this second run (making it the baseline test) and keep the first run as the default prefill cudagraph test.

Suggested change

LOADWORKER=18 CUDA_VISIBLE_DEVICES=6,7 python -m lightllm.server.api_server \

--model_dir /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/2fc06364715b967f1860aea9cf38778875588b17 \

--tp 2 \

--port 8089 \

--enable_prefill_cudagraph

--port 8089

LOADWORKER=18 CUDA_VISIBLE_DEVICES=6,7 python -m lightllm.server.api_server \

--model_dir /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/2fc06364715b967f1860aea9cf38778875588b17 \

--tp 2 \

--port 8089 \

--disable_prefill_cudagraph

feat: enable prefill cudagraph by default, add --disable_prefill_cuda…

2dcd49d

…graph

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable prefill cudagraph by default#1352

feat: enable prefill cudagraph by default#1352
sufubao wants to merge 1 commit into
ModelTC:mainfrom
sufubao:prefill_cudagraph

sufubao commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sufubao commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant