Problem
The awf-api-proxy container healthcheck is flaky, causing agentic workflow failures at a high rate. We've analyzed 164 workflow runs in our repository over the period April 23 – May 4, 2026:
| Firewall version |
Runs |
Failures |
Failure rate |
| v0.25.20 (CLI v0.68.3) |
33 |
27 |
82% |
| v0.25.28 (CLI v0.71.1) |
131 |
29 |
22% |
All failures occur in the "Execute GitHub Copilot CLI" step — the api-proxy container fails its healthcheck before the CLI can connect.
Root cause
The healthcheck start_period is too short (2s), causing the container to be marked unhealthy before it's ready. This was identified and fixed in:
However, no released CLI version includes firewall v0.25.29. The latest CLI (v0.71.1) still pins firewall v0.25.28.
Request
Release a new gh-aw CLI version that includes firewall v0.25.29 so the healthcheck fix is available to consumers.
Workaround
We've implemented an automatic rerun workflow that retries failed agentic runs (max 2 attempts), which reduces the effective failure rate to near-zero — but it wastes CI minutes and adds latency.
Problem
The
awf-api-proxycontainer healthcheck is flaky, causing agentic workflow failures at a high rate. We've analyzed 164 workflow runs in our repository over the period April 23 – May 4, 2026:All failures occur in the "Execute GitHub Copilot CLI" step — the api-proxy container fails its healthcheck before the CLI can connect.
Root cause
The healthcheck
start_periodis too short (2s), causing the container to be marked unhealthy before it's ready. This was identified and fixed in:However, no released CLI version includes firewall v0.25.29. The latest CLI (v0.71.1) still pins firewall v0.25.28.
Request
Release a new
gh-awCLI version that includes firewall v0.25.29 so the healthcheck fix is available to consumers.Workaround
We've implemented an automatic rerun workflow that retries failed agentic runs (max 2 attempts), which reduces the effective failure rate to near-zero — but it wastes CI minutes and adds latency.