Skip to content

refactor: install vLLM ROCm from upstream wheels#923

Merged
ericcurtin merged 2 commits into
docker:mainfrom
dmedovich:feat/vllm-rocm-use-wheels
May 20, 2026
Merged

refactor: install vLLM ROCm from upstream wheels#923
ericcurtin merged 2 commits into
docker:mainfrom
dmedovich:feat/vllm-rocm-use-wheels

Conversation

@dmedovich
Copy link
Copy Markdown
Contributor

Replaces the build-from-source approach (introduced in #917) with the upstream vLLM ROCm wheels published at https://wheels.vllm.ai/rocm/, as suggested by @ericcurtin.

Builds now take ~3-5 min instead of ~30-60 min, and the runtime image keeps the llamacpp ROCm base — restoring llama.cpp as an in-image fallback (matching the shape of the CUDA vllm variant).

vLLM ROCm wheels follow a separate release cadence from CUDA, so VLLM_ROCM_VERSION (default 0.21.0) is tracked independently from VLLM_VERSION (CUDA, default 0.19.1).

Refs #412

Replaces the build-from-source approach (introduced in #917) with the
upstream vLLM ROCm wheels published at https://wheels.vllm.ai/rocm/,
as suggested by @ericcurtin.

Builds now take ~3-5 min instead of ~30-60 min, and the runtime image
keeps the llamacpp ROCm base — restoring llama.cpp as an in-image
fallback (matching the shape of the CUDA vllm variant).

vLLM ROCm wheels follow a separate release cadence from CUDA, so
VLLM_ROCM_VERSION (default 0.21.0) is tracked independently from
VLLM_VERSION (CUDA, default 0.19.1).

Refs #412
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Consider explicitly pinning the vLLM package to VLLM_ROCM_VERSION in the uv install step (e.g., vllm==${VLLM_ROCM_VERSION}) so the wheel index version and the installed package version cannot drift apart.
  • In the apt install step, you can add --no-install-recommends to reduce the image size and avoid pulling in unnecessary packages.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider explicitly pinning the vLLM package to VLLM_ROCM_VERSION in the uv install step (e.g., `vllm==${VLLM_ROCM_VERSION}`) so the wheel index version and the installed package version cannot drift apart.
- In the `apt install` step, you can add `--no-install-recommends` to reduce the image size and avoid pulling in unnecessary packages.

## Individual Comments

### Comment 1
<location path="Dockerfile" line_range="183-184" />
<code_context>
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh \
+    && ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \
+    && . /opt/vllm-env/bin/activate \
+    && ~/.local/bin/uv pip install vllm \
+         --extra-index-url https://wheels.vllm.ai/rocm/${VLLM_ROCM_VERSION}/${VLLM_ROCM_TARGET} \
+         --index-strategy unsafe-best-match

</code_context>
<issue_to_address>
**issue (bug_risk):** Tie the installed vLLM package version to VLLM_ROCM_VERSION to avoid pulling a mismatched wheel from PyPI.

Because `uv pip install vllm` resolves against PyPI first, it may select a newer vLLM without ROCm wheels and then fall back to non-ROCm wheels, silently breaking the ROCm build and reproducibility. Please pin the version to match the ROCm wheel path, e.g.:

```dockerfile
&& ~/.local/bin/uv pip install "vllm==${VLLM_ROCM_VERSION}" \
     --extra-index-url https://wheels.vllm.ai/rocm/${VLLM_ROCM_VERSION}/${VLLM_ROCM_TARGET} \
     --index-strategy unsafe-best-match
```

so both the version and wheel source are driven by `VLLM_ROCM_VERSION`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread Dockerfile Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Review: vLLM ROCm Wheel Integration\n\n### Summary\nThis PR refactors the vLLM ROCm build process to use pre-built wheels from the official repository instead of building from source, which significantly reduces build time and aligns the architecture with the CUDA variant.\n\n### Critical\n- Dockerfile:184: The --extra-index-url is likely malformed as the repository is typically flat and the ROCm target version (rocm722) appears to be a typo. Fix: --extra-index-url https://wheels.vllm.ai/rocm

Comment thread Dockerfile
&& ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \
&& . /opt/vllm-env/bin/activate \
&& ~/.local/bin/uv pip install vllm \
--extra-index-url https://wheels.vllm.ai/rocm/${VLLM_ROCM_VERSION}/${VLLM_ROCM_TARGET} \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The extra index URL appears to be malformed. The official vLLM ROCm wheel repository at wheels.vllm.ai/rocm is a flat repository and typically does not use versioned subdirectories in the path. Furthermore, versioned paths in the vLLM project usually require a v prefix (e.g., v0.21.0). Additionally, rocm722 is likely a typo for a current ROCm version like rocm61 or rocm62. Using an invalid URL will cause the build to fail.

         --extra-index-url https://wheels.vllm.ai/rocm \

@ericcurtin
Copy link
Copy Markdown
Contributor

@dmedovich sometimes the bots are wrong, but can you confirm if they are right or wrong?

…back

Without pinning, uv resolves vllm against both PyPI and our extra ROCm
wheel index. If a newer vLLM lands on PyPI before the matching ROCm
wheel is published there, --index-strategy unsafe-best-match would pick
the higher PyPI version — silently installing a CUDA-only wheel into
the ROCm image.

Pinning vllm==${VLLM_ROCM_VERSION} ensures both the version and the
wheel source are driven by VLLM_ROCM_VERSION.
@dmedovich
Copy link
Copy Markdown
Contributor Author

@ericcurtin happy to confirm:

Gemini's review — wrong on all three points:

  1. "wheels.vllm.ai/rocm is a flat repo without versioned subdirs" — false. Verified:

    $ curl -sI https://wheels.vllm.ai/rocm/0.21.0/rocm722/
    HTTP/2 200
    

    The index returns vllm-0.21.0+rocm722-cp312-cp312-manylinux_2_34_x86_64.whl.

  2. "Needs v prefix (v0.21.0)" — false. The existing CUDA stage in the same Dockerfile uses wheels.vllm.ai/${VLLM_VERSION}/cu130 without a v prefix and works fine.

  3. "rocm722 is a typo, should be rocm61/rocm62" — false. rocm722 is exactly the suffix you suggested in your comment on feat: add vLLM ROCm Docker variant for AMD GPUs #917 (https://wheels.vllm.ai/rocm/0.21.0/rocm722), which this PR implements.

Sourcery's review — correct, fixed in last commit (2dbc908):

Without pinning vllm==${VLLM_ROCM_VERSION}, uv resolves against both PyPI and our extra index. If a newer vLLM (say 0.22.0) lands on PyPI before the matching ROCm wheel is published, --index-strategy unsafe-best-match would pick the higher PyPI version — silently installing a CUDA-only wheel into the ROCm image. Now pinned.

@ericcurtin ericcurtin merged commit 7b3a04d into docker:main May 20, 2026
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants