refactor: install vLLM ROCm from upstream wheels#923
Conversation
Replaces the build-from-source approach (introduced in #917) with the upstream vLLM ROCm wheels published at https://wheels.vllm.ai/rocm/, as suggested by @ericcurtin. Builds now take ~3-5 min instead of ~30-60 min, and the runtime image keeps the llamacpp ROCm base — restoring llama.cpp as an in-image fallback (matching the shape of the CUDA vllm variant). vLLM ROCm wheels follow a separate release cadence from CUDA, so VLLM_ROCM_VERSION (default 0.21.0) is tracked independently from VLLM_VERSION (CUDA, default 0.19.1). Refs #412
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider explicitly pinning the vLLM package to VLLM_ROCM_VERSION in the uv install step (e.g.,
vllm==${VLLM_ROCM_VERSION}) so the wheel index version and the installed package version cannot drift apart. - In the
apt installstep, you can add--no-install-recommendsto reduce the image size and avoid pulling in unnecessary packages.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider explicitly pinning the vLLM package to VLLM_ROCM_VERSION in the uv install step (e.g., `vllm==${VLLM_ROCM_VERSION}`) so the wheel index version and the installed package version cannot drift apart.
- In the `apt install` step, you can add `--no-install-recommends` to reduce the image size and avoid pulling in unnecessary packages.
## Individual Comments
### Comment 1
<location path="Dockerfile" line_range="183-184" />
<code_context>
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh \
+ && ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \
+ && . /opt/vllm-env/bin/activate \
+ && ~/.local/bin/uv pip install vllm \
+ --extra-index-url https://wheels.vllm.ai/rocm/${VLLM_ROCM_VERSION}/${VLLM_ROCM_TARGET} \
+ --index-strategy unsafe-best-match
</code_context>
<issue_to_address>
**issue (bug_risk):** Tie the installed vLLM package version to VLLM_ROCM_VERSION to avoid pulling a mismatched wheel from PyPI.
Because `uv pip install vllm` resolves against PyPI first, it may select a newer vLLM without ROCm wheels and then fall back to non-ROCm wheels, silently breaking the ROCm build and reproducibility. Please pin the version to match the ROCm wheel path, e.g.:
```dockerfile
&& ~/.local/bin/uv pip install "vllm==${VLLM_ROCM_VERSION}" \
--extra-index-url https://wheels.vllm.ai/rocm/${VLLM_ROCM_VERSION}/${VLLM_ROCM_TARGET} \
--index-strategy unsafe-best-match
```
so both the version and wheel source are driven by `VLLM_ROCM_VERSION`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
Review: vLLM ROCm Wheel Integration\n\n### Summary\nThis PR refactors the vLLM ROCm build process to use pre-built wheels from the official repository instead of building from source, which significantly reduces build time and aligns the architecture with the CUDA variant.\n\n### Critical\n- Dockerfile:184: The --extra-index-url is likely malformed as the repository is typically flat and the ROCm target version (rocm722) appears to be a typo. Fix: --extra-index-url https://wheels.vllm.ai/rocm
| && ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \ | ||
| && . /opt/vllm-env/bin/activate \ | ||
| && ~/.local/bin/uv pip install vllm \ | ||
| --extra-index-url https://wheels.vllm.ai/rocm/${VLLM_ROCM_VERSION}/${VLLM_ROCM_TARGET} \ |
There was a problem hiding this comment.
The extra index URL appears to be malformed. The official vLLM ROCm wheel repository at wheels.vllm.ai/rocm is a flat repository and typically does not use versioned subdirectories in the path. Furthermore, versioned paths in the vLLM project usually require a v prefix (e.g., v0.21.0). Additionally, rocm722 is likely a typo for a current ROCm version like rocm61 or rocm62. Using an invalid URL will cause the build to fail.
--extra-index-url https://wheels.vllm.ai/rocm \
|
@dmedovich sometimes the bots are wrong, but can you confirm if they are right or wrong? |
…back
Without pinning, uv resolves vllm against both PyPI and our extra ROCm
wheel index. If a newer vLLM lands on PyPI before the matching ROCm
wheel is published there, --index-strategy unsafe-best-match would pick
the higher PyPI version — silently installing a CUDA-only wheel into
the ROCm image.
Pinning vllm==${VLLM_ROCM_VERSION} ensures both the version and the
wheel source are driven by VLLM_ROCM_VERSION.
|
@ericcurtin happy to confirm: Gemini's review — wrong on all three points:
Sourcery's review — correct, fixed in last commit (2dbc908): Without pinning |
Replaces the build-from-source approach (introduced in #917) with the upstream vLLM ROCm wheels published at https://wheels.vllm.ai/rocm/, as suggested by @ericcurtin.
Builds now take ~3-5 min instead of ~30-60 min, and the runtime image keeps the llamacpp ROCm base — restoring llama.cpp as an in-image fallback (matching the shape of the CUDA vllm variant).
vLLM ROCm wheels follow a separate release cadence from CUDA, so VLLM_ROCM_VERSION (default 0.21.0) is tracked independently from VLLM_VERSION (CUDA, default 0.19.1).
Refs #412