Skip to content

feat(api-proxy): middle-power model fallback when selection criteria fail #3606

@lpcox

Description

@lpcox

Problem

Model selection in the api-proxy can fail in multiple ways:

  1. Requested model not in cachedModels — upstream returns 400/404
  2. Alias resolves to a glob with no matches — e.g., gpt-5.5* when only gpt-5.4 exists
  3. Family fallback exhaustedgpt-5.<minor>gpt-5 alias, but that alias also has no match
  4. Provider temporarily unavailable for specific model — model exists in cache but upstream rejects it
  5. Model removed between cache refresh — stale cachedModels list

In all these cases, the proxy currently either forwards the request unchanged (resulting in upstream errors) or returns null (no rewrite). The agent then retries in a loop, wasting tokens and time.

Proposed Solution: Middle-Power Fallback Policy

When all other selection criteria fail, select the middle-power model from the available models for the target provider. "Middle-power" is defined as the median model by capability tier from the cached model list.

Selection Algorithm

1. Filter cachedModels to the same provider and model family (if determinable)
2. Sort by capability tier (e.g., opus > sonnet > haiku; gpt-5.x > gpt-4.x > gpt-3.5)
3. Select the median entry (round down if even count)
4. If family filtering yields 0 results, fall back to all models for that provider

Capability Tier Ordering

Define a simple tier map:

  • Anthropic: opus (5) > sonnet (4) > haiku (3)
  • OpenAI/Copilot: gpt-5.x (5) > gpt-4.x (4) > gpt-3.5 (3)
  • Unknown: sort lexicographically, pick median

Logging Requirements

When the middle-power fallback activates, emit a clearly distinguishable structured log event:

{
  "level": "warn",
  "event": "model_fallback_activated",
  "provider": "copilot",
  "original_model": "gpt-5.5",
  "fallback_model": "gpt-4.1",
  "reason": "no_alias_match_and_not_in_available_models",
  "available_models_count": 24,
  "selection_method": "middle_power_median"
}

Key logging points:

  • model_fallback_activated (warn) — the fallback was used, with full context
  • model_fallback_candidates (debug) — the sorted tier list considered
  • model_fallback_skipped (info) — fallback was available but not needed (normal resolution succeeded)

Configuration

The fallback should be:

  • Enabled by default — this is a safety net, not a feature toggle
  • Configurable via stdin config (not env var) — e.g., model_fallback.enabled: true, model_fallback.strategy: "middle_power"
  • Overridable per-alias — an alias definition could set fallback: false to disable for that specific alias

Integration Points

  • containers/api-proxy/model-resolver.js — add fallback logic after line ~170 (where current resolution returns null)
  • containers/api-proxy/model-discovery.js — expose tier-sorted model list
  • containers/api-proxy/server.js — wire fallback config from stdin

Acceptance Criteria

  • When model resolution fails (all paths exhausted), the middle-power model is selected
  • A model_fallback_activated warn-level log is emitted with full context
  • Debug-level log shows the candidate list and tier sorting
  • Fallback can be disabled via stdin config
  • Unit tests cover: fallback activation, tier sorting for each provider family, disabled fallback, empty model list edge case
  • /reflect endpoint includes model_fallback.enabled and model_fallback.strategy in its output

Related

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions