refactor: install vLLM ROCm from upstream wheels by dmedovich · Pull Request #923 · docker/model-runner

dmedovich · 2026-05-19T15:45:09Z

Replaces the build-from-source approach (introduced in #917) with the upstream vLLM ROCm wheels published at https://wheels.vllm.ai/rocm/, as suggested by @ericcurtin.

Builds now take ~3-5 min instead of ~30-60 min, and the runtime image keeps the llamacpp ROCm base — restoring llama.cpp as an in-image fallback (matching the shape of the CUDA vllm variant).

vLLM ROCm wheels follow a separate release cadence from CUDA, so VLLM_ROCM_VERSION (default 0.21.0) is tracked independently from VLLM_VERSION (CUDA, default 0.19.1).

Refs #412

@ericcurtin

Replaces the build-from-source approach (introduced in #917) with the upstream vLLM ROCm wheels published at https://wheels.vllm.ai/rocm/, as suggested by @ericcurtin. Builds now take ~3-5 min instead of ~30-60 min, and the runtime image keeps the llamacpp ROCm base — restoring llama.cpp as an in-image fallback (matching the shape of the CUDA vllm variant). vLLM ROCm wheels follow a separate release cadence from CUDA, so VLLM_ROCM_VERSION (default 0.21.0) is tracked independently from VLLM_VERSION (CUDA, default 0.19.1). Refs #412

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Consider explicitly pinning the vLLM package to VLLM_ROCM_VERSION in the uv install step (e.g., vllm==${VLLM_ROCM_VERSION}) so the wheel index version and the installed package version cannot drift apart.
In the apt install step, you can add --no-install-recommends to reduce the image size and avoid pulling in unnecessary packages.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider explicitly pinning the vLLM package to VLLM_ROCM_VERSION in the uv install step (e.g., `vllm==${VLLM_ROCM_VERSION}`) so the wheel index version and the installed package version cannot drift apart.
- In the `apt install` step, you can add `--no-install-recommends` to reduce the image size and avoid pulling in unnecessary packages.

## Individual Comments

### Comment 1
<location path="Dockerfile" line_range="183-184" />
<code_context>
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh \
+    && ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \
+    && . /opt/vllm-env/bin/activate \
+    && ~/.local/bin/uv pip install vllm \
+         --extra-index-url https://wheels.vllm.ai/rocm/${VLLM_ROCM_VERSION}/${VLLM_ROCM_TARGET} \
+         --index-strategy unsafe-best-match

</code_context>
<issue_to_address>
**issue (bug_risk):** Tie the installed vLLM package version to VLLM_ROCM_VERSION to avoid pulling a mismatched wheel from PyPI.

Because `uv pip install vllm` resolves against PyPI first, it may select a newer vLLM without ROCm wheels and then fall back to non-ROCm wheels, silently breaking the ROCm build and reproducibility. Please pin the version to match the ROCm wheel path, e.g.:

```dockerfile
&& ~/.local/bin/uv pip install "vllm==${VLLM_ROCM_VERSION}" \
     --extra-index-url https://wheels.vllm.ai/rocm/${VLLM_ROCM_VERSION}/${VLLM_ROCM_TARGET} \
     --index-strategy unsafe-best-match
```

so both the version and wheel source are driven by `VLLM_ROCM_VERSION`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

Review: vLLM ROCm Wheel Integration\n\n### Summary\nThis PR refactors the vLLM ROCm build process to use pre-built wheels from the official repository instead of building from source, which significantly reduces build time and aligns the architecture with the CUDA variant.\n\n### Critical\n- Dockerfile:184: The --extra-index-url is likely malformed as the repository is typically flat and the ROCm target version (rocm722) appears to be a typo. Fix: --extra-index-url https://wheels.vllm.ai/rocm

gemini-code-assist · 2026-05-19T15:48:41Z

+    && ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \
+    && . /opt/vllm-env/bin/activate \
+    && ~/.local/bin/uv pip install vllm \
+         --extra-index-url https://wheels.vllm.ai/rocm/${VLLM_ROCM_VERSION}/${VLLM_ROCM_TARGET} \


The extra index URL appears to be malformed. The official vLLM ROCm wheel repository at wheels.vllm.ai/rocm is a flat repository and typically does not use versioned subdirectories in the path. Furthermore, versioned paths in the vLLM project usually require a v prefix (e.g., v0.21.0). Additionally, rocm722 is likely a typo for a current ROCm version like rocm61 or rocm62. Using an invalid URL will cause the build to fail.

--extra-index-url https://wheels.vllm.ai/rocm \

ericcurtin · 2026-05-19T20:09:20Z

@dmedovich sometimes the bots are wrong, but can you confirm if they are right or wrong?

…back Without pinning, uv resolves vllm against both PyPI and our extra ROCm wheel index. If a newer vLLM lands on PyPI before the matching ROCm wheel is published there, --index-strategy unsafe-best-match would pick the higher PyPI version — silently installing a CUDA-only wheel into the ROCm image. Pinning vllm==${VLLM_ROCM_VERSION} ensures both the version and the wheel source are driven by VLLM_ROCM_VERSION.

dmedovich · 2026-05-19T21:25:41Z

@ericcurtin happy to confirm:

Gemini's review — wrong on all three points:

"wheels.vllm.ai/rocm is a flat repo without versioned subdirs" — false. Verified:
```
$ curl -sI https://wheels.vllm.ai/rocm/0.21.0/rocm722/
HTTP/2 200
```
The index returns vllm-0.21.0+rocm722-cp312-cp312-manylinux_2_34_x86_64.whl.
"Needs v prefix (v0.21.0)" — false. The existing CUDA stage in the same Dockerfile uses wheels.vllm.ai/${VLLM_VERSION}/cu130 without a v prefix and works fine.
"rocm722 is a typo, should be rocm61/rocm62" — false. rocm722 is exactly the suffix you suggested in your comment on feat: add vLLM ROCm Docker variant for AMD GPUs #917 (https://wheels.vllm.ai/rocm/0.21.0/rocm722), which this PR implements.

Sourcery's review — correct, fixed in last commit (2dbc908):

Without pinning vllm==${VLLM_ROCM_VERSION}, uv resolves against both PyPI and our extra index. If a newer vLLM (say 0.22.0) lands on PyPI before the matching ROCm wheel is published, --index-strategy unsafe-best-match would pick the higher PyPI version — silently installing a CUDA-only wheel into the ROCm image. Now pinned.

dmedovich mentioned this pull request May 19, 2026

feat: add vLLM ROCm Docker variant for AMD GPUs #917

Merged

sourcery-ai Bot reviewed May 19, 2026

View reviewed changes

Comment thread Dockerfile Outdated

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

ericcurtin approved these changes May 19, 2026

View reviewed changes

ericcurtin merged commit 7b3a04d into docker:main May 20, 2026
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: install vLLM ROCm from upstream wheels#923

refactor: install vLLM ROCm from upstream wheels#923
ericcurtin merged 2 commits into
docker:mainfrom
dmedovich:feat/vllm-rocm-use-wheels

dmedovich commented May 19, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

ericcurtin commented May 19, 2026

Uh oh!

dmedovich commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dmedovich commented May 19, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

ericcurtin commented May 19, 2026

Uh oh!

dmedovich commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants