vLLM RuntimeError: Engine core initialization failed 完整排查修复方案

14次阅读
一条评论

报错核心解读

堆栈最终抛出:RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

含义:vLLM 主进程拉起的 EngineCore 子进程启动失败、提前崩溃退出,但当前日志只打印了主进程堆栈,真正崩溃原因在子进程标准输出 / 错误日志,你需要先把完整日志全部打出才能定位根因。

一、第一步:先捕获子进程真实报错(必做)

1. 启动时重定向完整日志(Linux/WSL)

bash

运行

# 把 stdout/stderr 全部写入日志文件,包含子进程崩溃堆栈
vllm serve 你的模型路径 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.85 \
> vllm_full.log 2>&1

运行后崩溃,打开 vllm_full.log,往上搜索 ERRORTracebackCUDA errorOOMout of memorycannot allocate memory,那行才是根源。

2. 临时加调试参数打印子进程详情

启动命令追加:

plaintext

--log-level debug --enable-metrics

debug 日志会打印子进程创建、握手、GPU 显存分配、模型加载全过程。

二、高频根因分类与对应修复(按出现概率排序)

场景 1:GPU 显存不足(最常见)

日志特征:子进程日志含 CUDA out of memory / cannot allocate tensor

修复方案:

  1. 降低显存占用系数 bash运行--gpu-memory-utilization 0.7
  2. 开启分页注意力 / 量化加载 bash运行--enforce-eager \ --quantization gptq/awq/squeezellm(根据模型)\ --max-model-len 8192(缩小上下文长度)
  3. 限制批处理并发 bash运行--max-num-batched-tokens 4096 --max-num-seqs 16

场景 2:CUDA / PyTorch /vLLM 版本不匹配

环境特征:Python3.14 + 新版 vLLM + 旧 CUDA 驱动

  1. 核对组合要求
    • vLLM 稳定推荐 Python3.10/3.11,Python3.14 属于未完全适配版本,极易出现多进程通信、uvloop、torch 子进程兼容 bug
    • CUDA 驱动版本 ≥ 对应 vLLM 编译 CUDA 版本
  2. 修复建议
    • 降级 Python 到 3.11 重建虚拟环境
    • 重装匹配版本 torch + vllm bash运行pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 pip install -U vllm==0.6.3 稳定版

场景 3:WSL2 环境专属问题(你路径 /mnt/c/Users/ 确认是 WSL)

WSL 经典踩坑点:

  1. WSL2 GPU 显存共享不足、cgroup 权限异常 bash运行# 限制单进程显存,关闭uvloop改用标准asyncio vllm serve 模型 --disable-uvloop --gpu-memory-utilization 0.65
  2. WSL 共享磁盘 /mnt/c 读取模型速度慢、加载超时导致子进程握手失败 解决:把模型复制到 WSL 内部路径(~/models/xxx)不要直接挂载 C 盘加载
  3. WSL 多进程 IPC 通信异常 追加启动参数禁用多进程核心: bash运行--single-process

场景 4:模型文件损坏 / 缺失 / 权限不足

日志特征:子进程报 safetensors load errorfile not found

  1. 校验模型文件完整性,重新拉取模型
  2. WSL 挂载 C 盘文件权限问题:复制到 Linux 本地目录再启动
  3. 添加 --trust-remote-code(自定义架构模型必须)

场景 5:多进程 /uvloop 事件循环冲突(堆栈出现大量 uvloop)

你的堆栈全程出现 uvloop.run,uvloop 在 vLLM v1 引擎多进程下兼容性差:

临时根治参数:禁用 uvloop

bash

运行

--disable-uvloop

vLLM v1 AsyncMPClient 多进程 + uvloop 极易出现子进程握手超时、core 引擎启动失败。

场景 6:张量并行配置错误(单卡误设 TP>1)

单 GPU 环境不要加 --tensor-parallel-size 2,会直接拉起多个 core 子进程失败。

单卡固定:--tensor-parallel-size 1

场景 7:系统内存 (RAM) 不足,子进程被 OOM killer 杀死

Linux/WSL 查看系统内存:free -h

vLLM 加载模型除显存外,CPU 内存至少预留 8~16GB;

缓解参数:

bash

运行

--swap-space 4 --enforce-eager

三、最小可用测试启动命令(WSL+Python3.14 兼容兜底)

直接替换模型路径运行,规避绝大多数兼容问题:

bash

运行

vllm serve /home/AI-Space001/models/你的模型 \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.65 \
--disable-uvloop \
--single-process \
--trust-remote-code \
--log-level debug

四、快速定位流程总结

  1. > vllm.log 2>&1 抓取完整子进程日志
  2. 日志向上搜索 CUDA OOM / 文件错误 /torch 版本报错
  3. WSL 环境优先加 --disable-uvloop --single-process
  4. Python3.14 不稳定,建议切 3.11 重建 venv 重装 vLLM
  5. 模型移至 WSL 本地目录,不要直接读取 /mnt/c 下文件
正文完
可以使用微信扫码关注公众号(ID:xzluomor)
post-qrcode
 0
评论(一条评论)
验证码
一路向北
2026-06-26 18:21:30 回复

(APIServer pid=3720) Traceback (most recent call last):
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/bin/vllm”, line 8, in
(APIServer pid=3720) sys.exit(main())
(APIServer pid=3720) ~~~~^^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/entrypoints/cli/main.py”, line 92, in main
(APIServer pid=3720) args.dispatch_function(args)
(APIServer pid=3720) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/entrypoints/cli/serve.py”, line 148, in cmd
(APIServer pid=3720) uvloop.run(run_server(args))
(APIServer pid=3720) ~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/uvloop/__init__.py”, line 96, in run
(APIServer pid=3720) return __asyncio.run(
(APIServer pid=3720) ~~~~~~~~~~~~~^
(APIServer pid=3720) wrapper(),
(APIServer pid=3720) ^^^^^^^^^^
(APIServer pid=3720) ……
(APIServer pid=3720) **run_kwargs
(APIServer pid=3720) ^^^^^^^^^^^^
(APIServer pid=3720) )
(APIServer pid=3720) ^
(APIServer pid=3720) File “/usr/lib/python3.14/asyncio/runners.py”, line 204, in run
(APIServer pid=3720) return runner.run(main)
(APIServer pid=3720) ~~~~~~~~~~^^^^^^
(APIServer pid=3720) File “/usr/lib/python3.14/asyncio/runners.py”, line 127, in run
(APIServer pid=3720) return self._loop.run_until_complete(task)
(APIServer pid=3720) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
(APIServer pid=3720) File “uvloop/loop.pyx”, line 1512, in uvloop.loop.Loop.run_until_complete
(APIServer pid=3720) File “uvloop/loop.pyx”, line 1505, in uvloop.loop.Loop.run_until_complete
(APIServer pid=3720) self.run_forever()
(APIServer pid=3720) File “uvloop/loop.pyx”, line 1379, in uvloop.loop.Loop.run_forever
(APIServer pid=3720) self._run(mode)
(APIServer pid=3720) File “uvloop/loop.pyx”, line 557, in uvloop.loop.Loop._run
(APIServer pid=3720) raise self._last_error
(APIServer pid=3720) File “uvloop/loop.pyx”, line 476, in uvloop.loop.Loop._on_idle
(APIServer pid=3720) handler._run()
(APIServer pid=3720) File “uvloop/cbhandles.pyx”, line 83, in uvloop.loop.Handle._run
(APIServer pid=3720) File “uvloop/cbhandles.pyx”, line 63, in uvloop.loop.Handle._run
(APIServer pid=3720) callback(*args)
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/uvloop/__init__.py”, line 48, in wrapper
(APIServer pid=3720) return await main
(APIServer pid=3720) ^^^^^^^^^^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/entrypoints/openai/api_server.py”, line 678, in run_server
(APIServer pid=3720) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/entrypoints/openai/api_server.py”, line 696, in run_server_worker
(APIServer pid=3720) shutdown_task = await build_and_serve(
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) engine_client, listen_address, sock, args, **uvicorn_kwargs
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) )
(APIServer pid=3720) ^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/entrypoints/openai/api_server.py”, line 594, in build_and_serve
(APIServer pid=3720) await init_app_state(engine_client, app.state, args, supported_tasks)
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/entrypoints/openai/api_server.py”, line 407, in init_app_state
(APIServer pid=3720) await init_generate_state(
(APIServer pid=3720) engine_client, state, args, request_logger, supported_tasks
(APIServer pid=3720) )
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/entrypoints/openai/generate/api_router.py”, line 140, in init_generate_state
(APIServer pid=3720) state.openai_serving_chat.warmup()
(APIServer pid=3720) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/entrypoints/openai/chat_completion/serving.py”, line 181, in warmup
(APIServer pid=3720) self.renderer.warmup(
(APIServer pid=3720) ~~~~~~~~~~~~~~~~~~~~^
(APIServer pid=3720) ChatParams(
(APIServer pid=3720) ^^^^^^^^^^^
(APIServer pid=3720) ……
(APIServer pid=3720) )
(APIServer pid=3720) ^
(APIServer pid=3720) )
(APIServer pid=3720) ^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/renderers/base.py”, line 251, in warmup
(APIServer pid=3720) self._warmup_mm_processor(
(APIServer pid=3720) ~~~~~~~~~~~~~~~~~~~~~~~~~^
(APIServer pid=3720) self.mm_processor,
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) log_prefix=”Multi-modal”,
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) )
(APIServer pid=3720) ^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/renderers/base.py”, line 221, in _warmup_mm_processor
(APIServer pid=3720) _ = processor.apply(processor_inputs, timing_ctx=TimingContext(enabled=False))
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/multimodal/processing/processor.py”, line 1685, in apply
(APIServer pid=3720) ) = self._cached_apply_hf_processor(inputs, timing_ctx)
(APIServer pid=3720) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/multimodal/processing/processor.py”, line 1474, in _cached_apply_hf_processor
(APIServer pid=3720) ) = self._apply_hf_processor_main(
(APIServer pid=3720) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
(APIServer pid=3720) prompt=inputs.prompt,
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) ……
(APIServer pid=3720) enable_hf_prompt_update=False,
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) )
(APIServer pid=3720) ^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/multimodal/processing/processor.py”, line 1291, in _apply_hf_processor_main
(APIServer pid=3720) mm_processed_data = self._apply_hf_processor_mm_only(
(APIServer pid=3720) mm_items=mm_items,
(APIServer pid=3720) hf_processor_mm_kwargs=hf_processor_mm_kwargs,
(APIServer pid=3720) tokenization_kwargs=tokenization_kwargs,
(APIServer pid=3720) )
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/multimodal/processing/processor.py”, line 1232, in _apply_hf_processor_mm_only
(APIServer pid=3720) _, mm_processed_data, _ = self._apply_hf_processor_text_mm(
(APIServer pid=3720) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
(APIServer pid=3720) prompt_text=self.dummy_inputs.get_dummy_text(mm_counts),
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) ……
(APIServer pid=3720) tokenization_kwargs=tokenization_kwargs,
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) )
(APIServer pid=3720) ^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/multimodal/processing/processor.py”, line 1153, in _apply_hf_processor_text_mm
(APIServer pid=3720) processed_data = self._call_hf_processor(
(APIServer pid=3720) prompt=prompt_text,
(APIServer pid=3720) ……
(APIServer pid=3720) tok_kwargs=tokenization_kwargs,
(APIServer pid=3720) )
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/model_executor/models/qwen3_vl.py”, line 1265, in _call_hf_processor
(APIServer pid=3720) video_outputs = super()._call_hf_processor(
(APIServer pid=3720) prompt=””,
(APIServer pid=3720) ……
(APIServer pid=3720) tok_kwargs=tok_kwargs,
(APIServer pid=3720) )
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/multimodal/processing/processor.py”, line 1110, in _call_hf_processor
(APIServer pid=3720) return self.info.ctx.call_hf_processor(
(APIServer pid=3720) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
(APIServer pid=3720) self.info.get_hf_processor(**mm_kwargs),
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) dict(text=prompt, **mm_data),
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) dict(**mm_kwargs, **tok_kwargs),
(APIServer pid=3720) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) )
(APIServer pid=3720) ^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/multimodal/processing/context.py”, line 269, in call_hf_processor
(APIServer pid=3720) output = hf_processor(**data, **allowed_kwargs)
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/transformers/processing_utils.py”, line 668, in __call__
(APIServer pid=3720) processed_videos, videos_replacements = self._process_videos(videos, **merged_kwargs[“videos_kwargs”])
(APIServer pid=3720) ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/transformers/processing_utils.py”, line 770, in _process_videos
(APIServer pid=3720) processed_videos = self.video_processor(videos, **kwargs)
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/transformers/video_processing_utils.py”, line 178, in __call__
(APIServer pid=3720) return self.preprocess(videos, **kwargs)
(APIServer pid=3720) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/transformers/video_processing_utils.py”, line 369, in preprocess
(APIServer pid=3720) preprocessed_videos = self._preprocess(videos=videos, **kwargs)
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/transformers/models/qwen3_vl/video_processing_qwen3_vl.py”, line 223, in _preprocess
(APIServer pid=3720) stacked_videos = self.rescale_and_normalize(
(APIServer pid=3720) stacked_videos, do_rescale, rescale_factor, do_normalize, image_mean, image_std
(APIServer pid=3720) )
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/transformers/image_processing_backends.py”, line 346, in rescale_and_normalize
(APIServer pid=3720) images = self.normalize(images.to(dtype=torch.float32), image_mean, image_std)
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/transformers/image_processing_backends.py”, line 300, in normalize
(APIServer pid=3720) def normalize(
(APIServer pid=3720)
(APIServer pid=3720) File “/mnt/c/Users/AI-Space001/venv-vllm/lib/python3.14/site-packages/vllm/entrypoints/openai/api_server.py”, line 673, in _interrupt_init
(APIServer pid=3720) raise KeyboardInterrupt(“terminated”)
(APIServer pid=3720) KeyboardInterrupt: terminated

 Windows  Chrome  中国北京北京市联通