Skip to content

feat(py-client): add protoc-gen-py-client (Python HTTP client generator)#172

Merged
SebastienMelki merged 14 commits into
mainfrom
feat/py-client
May 19, 2026
Merged

feat(py-client): add protoc-gen-py-client (Python HTTP client generator)#172
SebastienMelki merged 14 commits into
mainfrom
feat/py-client

Conversation

@SebastienMelki
Copy link
Copy Markdown
Owner

Adds protoc-gen-py-client, the sixth generator in the sebuf toolkit. Generates type-safe Python HTTP clients that depend only on the Python standard library (Python 3.10+); users plug in requests / httpx / aiohttp via a duck-typed HttpTransport Protocol at construction time.

What's in the box

  • cmd/protoc-gen-py-client/ + internal/pyclientgen/ — full generator
  • Per-proto-file output: message @dataclasses, IntEnums, HttpTransport Protocol + UrllibTransport default, typed *ClientOptions / *CallOptions, one client class per service, and a per-*Error-message ApiError subclass hierarchy chosen at runtime from response shape via _ERROR_CLASSES
  • All (sebuf.http.*) JSON-mapping annotations honored: int64_encoding, enum_encoding + enum_value, nullable, empty_behavior, timestamp_format, bytes_encoding, oneof_config + oneof_value, flatten + flatten_prefix, unwrap (map-value + root)
  • Path params, query params (urlencode(doseq=True)), typed service + method header kwargs, content-type negotiation surface, Python 3.10 keyword escaping (hard + soft)
  • 15 per-feature golden tests + helper unit tests; every generated file is fed to python3 -c "import ast; ast.parse(...)" so syntactic regressions are caught even when the golden string-compare wouldn't fire
  • examples/python-client-demo/ mirrors examples/ts-client-demo/ section-by-section against the same Go HTTP server — make demo runs the full suite end-to-end

What's deferred

Filed as follow-up issues:

Generator bugs caught during Phase 2 testing

These were latent in the implementation commits and fixed before locking the goldens:

  • Empty error-registry set emitted as {} (a dict) instead of set()
  • Redundant pass followed by methods on field-less messages
  • google.protobuf.Timestamp typed as int/str for unix/date formats while the encoder always called datetime methods — aligned on datetime everywhere
  • Missing None-guard on WKT message-kind fields (Timestamp, Duration, Any, …) in to_dict
  • Enum variant names were prefix-stripped and lowercased (Priority.high) but the encoder emitted IntEnum.name, producing \"high\" instead of Go's \"PRIORITY_HIGH\" — wire-format-breaking. Restored verbatim proto names

Credit

This builds on test fixtures and serialization ideas from @elzalem's #132. The architecture intentionally follows the existing tsclientgen / clientgen pattern (calling internal/annotations/ directly) rather than the contractmodel layer from that PR — so we could ship without blocking on the C# generator design discussion separately tracked on #131.

Test plan

  • make build — produces bin/protoc-gen-py-client
  • go test ./internal/pyclientgen/... — golden tests + helper unit tests all pass
  • golangci-lint run ./internal/pyclientgen/... ./cmd/protoc-gen-py-client/... — 0 issues
  • cd examples/python-client-demo && make demo — all 7 demo sections green end-to-end against the real Go server
  • Eyeball docs/python-generation.md and the new entries in README.md + CLAUDE.md

🤖 Generated with Claude Code

SebastienMelki and others added 10 commits May 19, 2026 12:17
Stand up cmd/protoc-gen-py-client and internal/pyclientgen mirroring the
tsclientgen layout: one generated _client.py per .proto source, stdlib-only
output, dataclasses + IntEnum + Protocol-typed transport. Field rendering,
JSON-mapping annotations, and RPC method bodies land in subsequent commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ions

Replace the message scaffold with full @DataClass field rendering plus
to_dict / from_dict serialization. Honors int64_encoding, enum_encoding +
enum_value, bytes_encoding, timestamp_format, nullable, empty_behavior,
unwrap (root + map-value), flatten + flatten_prefix, and oneof discriminator
configurations. Adds a Python type-mapping helper module and JSON encode /
decode expression builders so each field collapses to one or two lines in
the generated to_dict / from_dict.

Enum decoding emits a per-enum helper that accepts string (proto name or
custom enum_value) or int wire forms and raises on unknown values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…archy

Flesh out the client class with real request/response handling:
- path parameter substitution via urllib.parse.quote
- query parameter encoding via urlencode(doseq=True), with proper guards for
  string/bool/numeric/repeated fields
- header building from default + per-call + typed service/method header options
  generated from sebuf.http.service_headers and method_headers annotations
- transport invocation through the injectable HttpTransport protocol with
  per-call timeout fallback
- response parsing using each message's generated from_dict
- content-type negotiation surface (JSON implemented, proto raises
  NotImplementedError until a follow-up adds binary protobuf encoding)
- SSE streaming methods detected via HttpConfig.stream and emit
  NotImplementedError pointing at the follow-up issue

Replace the error stub with full per-*Error-message exception classes. Each
class subclasses ApiError, exposes proto fields as constructor kwargs, and
ships a populate() classmethod that builds an instance from a parsed JSON
dict. An _ERROR_CLASSES registry indexed by required JSON key set lets the
client's _raise_for_status pick the most specific exception for a response.

Add a constants.go module with Python type names and well-known type
proto-name constants to satisfy goconst, and tighten every switch with the
appropriate nolint:exhaustive pragmas where the default branch is intentional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generator implementation is complete on this branch (3 commits). This doc
hands off the remaining test, demo, docs, and PR-opening work to the next
agent. Includes file-by-file pointers to the patterns to mirror, the lint
command tuned for go.mod 1.26, a note about the pre-existing openapiv3 test
failure on main, and the rationale for not cherry-picking from PR #132.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bugs

Adds 15 per-feature test protos mirroring tsclientgen's testdata (plus a
new errors.proto exercising the per-*Error exception class generation
that is unique to py-client) and a golden test harness that also runs
`python3 -c "import ast; ast.parse(...)"` on each generated file to
catch syntactic regressions a string-compare cannot.

Capturing the goldens surfaced four generator bugs, all fixed here:

- error.go: empty set literal was emitted as `{}`, which is an empty
  dict — violated the registry's `set[str]` type and would have crashed
  if the runtime guard ever fell through.
- message.go: empty messages emitted `pass` followed by methods, which
  is semantically incorrect and noisy. The methods alone keep the class
  body non-empty.
- types.go: Timestamp WKT fields with unix-seconds / unix-millis /
  date formats were typed as `int` / `str`, but encoding.go always
  calls `.timestamp()` / `.strftime()`, assuming `datetime`. Aligned
  on `datetime` for every timestamp_format — the format only affects
  the wire encoding, not the user-facing type.
- message.go: WKT message-kind fields (Timestamp, Duration, FieldMask,
  Any, Empty, Struct, scalar wrappers) routed through the scalar
  to_dict path were emitted unconditionally, even though they are
  always nullable in proto3. Guard them like proto3 `optional` scalars
  so the encoder never sees a `None` default and raises AttributeError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tants

Covers the pure helpers that the golden tests exercise only indirectly:

- snakeCase: CamelCase → snake_case method-name conversion
- headerOptionName: HTTP header → Python kwarg, with keyword-collision
  escape (X-Class → class_)
- escapePyKeyword: hard + soft Python 3.10 keywords
- formatPyStringSet: empty input emits set(), not the dict-literal {}
- stripOptional, camelToSnake, isInvalidIdentifier: small string utils

Also lifts three repeated literals into constants (pyFalse, pyEmptySet,
pyListStr) so goconst is happy and there is one place to change the
emitted Python idiom for each.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The generator was stripping the enum-name prefix and lowercasing each
variant ("PRIORITY_HIGH" -> "high"). That was an ergonomic improvement
on paper but broke cross-generator wire compatibility: the Go server
emits enums via protojson default ("PRIORITY_HIGH"), while
_encode_enum_X falls back to IntEnum.name, which the renaming had
turned into "high". A Python client talking to a Go server was always
going to misparse enum-typed fields.

Keep the proto value name verbatim ("PRIORITY_HIGH") so .name and the
wire format agree. Users write Priority.PRIORITY_HIGH which is also
PEP 8-conformant (UPPER_CASE for enum members).

Removes the now-orphaned camelToSnake/isInvalidIdentifier helpers and
their unit tests. Verified end-to-end against the python-client-demo
Go server: enums round-trip correctly across CRUD, query filtering,
and the unwrap response path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors examples/ts-client-demo section-by-section so a reader can
compare the two client surfaces directly. Shares the proto + Go HTTP
server with the TS demo (NoteService — CRUD over Notes with enums,
maps, optional fields, headers, query params, validation, unwrap
response, and a typed NotFoundError).

The Python client demonstrates:

- Section 1: NoteServiceClientOptions with typed kwargs for service
  headers (api_key, tenant_id) and a default_headers escape hatch
- Section 2: every HTTP verb (GET/POST/PUT/PATCH/DELETE) with path
  params, request bodies, and method-level headers via call options
- Section 3: query parameter encoding for ListNotes
  (status/priority/sort/limit/offset)
- Section 4: header layering (service options vs call options vs
  per-call headers dict, and per-call override of a service header)
- Section 5: ValidationError parsing on min_len / max_len / missing
  required header — same buf.validate rules as the TS demo
- Section 6: typed NotFoundError exception subclass (not a generic
  ApiError) chosen by the _ERROR_CLASSES registry from response shape
- Section 7: custom HttpTransport injection (logging middleware) and
  the unwrap response path (NoteList.notes flattens on the wire)

Verified end-to-end against the Go server: `make demo` runs the full
suite cleanly with no failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ADME + CLAUDE.md

Adds the dedicated Python client reference and wires protoc-gen-py-client
into the toolkit overview in README.md and CLAUDE.md (now six plugins,
not five).

docs/python-generation.md covers: generator output (dataclasses, IntEnum,
transport Protocol, error hierarchy, options, client class), transport
injection, URL building (path + query params), header layering,
ApiError/ValidationError/typed *Error exceptions, every JSON-mapping
annotation (with focus on Timestamp/int64/bytes/oneof — same wire
format as the Go and TS generators), Python keyword escaping, the SSE
NotImplementedError stub, known limitations, and a link to
examples/python-client-demo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both planning docs were time-limited handoffs between agents working on
this branch (PYTHON_CLIENT_REWRITE.md → the rewrite plan after PR #132
was closed; PY_CLIENT_HANDOFF.md → the Phase 2 testing/docs/PR handoff).
The work they tracked is now landed on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 2.79965% with 1111 lines in your changes missing coverage. Please review.
✅ Project coverage is 4.52%. Comparing base (1b08604) to head (cc97e9f).

Files with missing lines Patch % Lines
internal/pyclientgen/message.go 0.00% 300 Missing ⚠️
internal/pyclientgen/client.go 7.11% 222 Missing ⚠️
internal/pyclientgen/error.go 5.10% 130 Missing ⚠️
internal/pyclientgen/types.go 3.20% 121 Missing ⚠️
internal/pyclientgen/encoding.go 0.00% 118 Missing ⚠️
internal/pyclientgen/collect.go 0.00% 76 Missing ⚠️
internal/pyclientgen/transport.go 0.00% 49 Missing ⚠️
internal/pyclientgen/enum.go 0.00% 40 Missing ⚠️
internal/pyclientgen/generator.go 0.00% 31 Missing ⚠️
internal/pyclientgen/imports.go 0.00% 14 Missing ⚠️
... and 1 more
Additional details and impacted files
@@           Coverage Diff            @@
##            main    #172      +/-   ##
========================================
- Coverage   4.97%   4.52%   -0.45%     
========================================
  Files         35      47      +12     
  Lines       4443    5586    +1143     
========================================
+ Hits         221     253      +32     
- Misses      4218    5329    +1111     
  Partials       4       4              
Flag Coverage Δ
unittests 4.52% <2.79%> (-0.45%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

🔍 CI Pipeline Status

Lint: success
Test: success
Coverage: success
Build: success
Integration: success


📊 Coverage Report: Available in checks above
🔗 Artifacts: Test results and coverage reports uploaded

@yashagarwal-sarwa
Copy link
Copy Markdown

Bug: error classes emitted before the enums they reference

When a proto message with an Error suffix references an enum field with a default value, the generated error class is declared before the enum — causing a NameError at import time.

Reproduction: Our EventError message has code: RejectionReason as a field. The generated output emits class EventError(ApiError) at line 99, but class RejectionReason(IntEnum) isn't declared until line 215.

class EventError(ApiError):
code: RejectionReason = RejectionReason.REJECTION_REASON_UNSPECIFIED # NameError here
Fix: Enums should be emitted before error classes in the output ordering. Looks like it's a topological sort issue in the emission order — error classes need to come after all enums they depend on.

When an *Error message has an enum-typed field, the generated default
expression — code: Reason = Reason.X — is evaluated at class-definition
time, so the enum class must already be declared. The previous file
ordering emitted writeErrors before the enum loop, raising NameError at
import time for any error that referenced an enum.

Reorder the file so enums are written before writeErrors. Messages
already trailed both blocks and don't need adjustment — message-typed
defaults are always None, so forward references in them are safe.

Add a regression case to testdata/proto/errors.proto (EventError +
RejectionReason) matching the exact shape @yashagarwal-sarwa reported
on #172, and upgrade the golden test to actually execute each generated
file via importlib (ast.parse only checks syntax, not runtime
NameErrors). The new import check also registers the module in
sys.modules so @DataClass machinery can resolve string annotations from
`from __future__ import annotations`.

Reported-by: @yashagarwal-sarwa

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SebastienMelki
Copy link
Copy Markdown
Owner Author

@yashagarwal-sarwa great catch — confirmed and fixed in 0166439.

Root cause was exactly what you diagnosed: writeErrors ran before the enum loop in generateClientFile, so any *Error message with an enum-typed field referenced a class that wasn't defined yet. from __future__ import annotations only stringifies annotations, not default-value expressions, which are evaluated at def-time.

The fix is a one-line reorder (enums before errors) — messages already trailed both blocks, and message-typed defaults are always None so they're safe. Added an EventError + RejectionReason regression case to testdata/proto/errors.proto matching the exact shape you reported.

Also upgraded the golden test: every generated file now goes through importlib.util.spec_from_file_location + exec_module (with the module registered in sys.modules so @dataclass can resolve forward annotations). ast.parse alone wouldn't have caught this — NameError is a runtime error, not syntactic — so the import check is the real backstop going forward.

Verified end-to-end against the python-client-demo Go server: clean run.

@SebastienMelki
Copy link
Copy Markdown
Owner Author

@yashagarwal-sarwa fix is up in 0166439 (now on the branch). Mind taking another look when you have a minute?

@SebastienMelki SebastienMelki requested review from elzalem and shavaizknz and removed request for yashagarwal-sarwa May 19, 2026 12:17
@yashagarwal-sarwa
Copy link
Copy Markdown

yashagarwal-sarwa commented May 19, 2026

EventError.from_dict() missing — EventResult.from_dict() fails at runtime

Error classes generated from proto *Error messages get to_dict() and populate(), but NOT from_dict(). However, when EventError appears as a field on a regular message (EventResult.error), the generated EventResult.from_dict() calls EventError.from_dict() which doesn't exist:

In EventResult.from_dict() — generated code:

kwargs["error"] = EventError.from_dict(data["error"]) # AttributeError: no from_dict
Fix: Either add from_dict() as an alias for populate() on error classes, or have the message deserializer use populate() when the target type is an error class.

SebastienMelki and others added 3 commits May 19, 2026 15:27
When an *Error message is embedded as a field on another message,
the parent's generated from_dict calls EventError.from_dict(...) —
but error classes only had populate() and to_dict(), so the call
raised AttributeError at runtime.

Add a from_dict classmethod on every *Error class that delegates to
populate() with neutral status/body/headers. This keeps the error
class shape interchangeable with regular messages for serialization
purposes, which is what the parent message's deserializer assumes.

Extend errors.proto with EventResult { EventError error } as a
regression case matching the exact shape @yashagarwal-sarwa reported
on #172. The import test (added in 0166439) catches the AttributeError
on the next regen attempt.

Reported-by: @yashagarwal-sarwa

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t-map unwrap

Three generator bugs surfaced by the new examples/python-encoding-demo
end-to-end round-trip against a real Go server:

1. decodeTimestampExpr for TIMESTAMP_FORMAT_DATE returned the raw
   "YYYY-MM-DD" string instead of a datetime, even though the field
   type is datetime. Now parses with datetime.strptime so the assigned
   value matches the declared annotation.

2. The Python flatten encoder iterated nested.to_dict().items() and
   prefix-tagged each key — which used JSON names (camelCase). The Go
   HTTP plugin's flatten encoder uses proto names (snake_case), so the
   Python side emitted `author_zipCode` while the server emits
   `author_zip_code`, breaking round-trips. Rewritten to emit one wire
   key per nested field using the field's proto name, with the matching
   decoder reading those keys directly. Encoder + decoder now agree
   with the Go server byte for byte.

3. annotations.FindUnwrapField is documented as a list-only helper, but
   py-client's root-unwrap codepaths called it for messages whose unwrap
   field is a map. That silently produced empty `to_dict() -> {}` /
   `from_dict() -> cls()` on every root-map unwrap message. Added a
   local findRootUnwrapField that doesn't filter on IsList(); kept the
   shared helper unchanged so other generators stay untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new end-to-end examples that round-trip every protoc-gen-py-client
feature (except SSE, tracked as #167) against a real Go server. Each
example follows the established repo pattern — single focus, Go server
+ Python client + `make demo` target.

examples/python-encoding-demo (51 assertions)
  Round-trips every JSON-mapping annotation: enum_value override,
  timestamp_format (RFC3339/UNIX_S/UNIX_MS/DATE), int64_encoding
  STRING+NUMBER, bytes_encoding base64+HEX, flatten+flatten_prefix,
  oneof_config nested + flattened variants, all three unwrap variants
  (root repeated, root map, map-value), Python keyword field-name
  escaping (`from`/`class`/`return`), and repeated query parameters.
  Each annotation lives on its own message because the Go HTTP plugin
  emits one MarshalJSON method per (message, annotation) and would
  produce duplicate methods otherwise.

  Writing this demo surfaced three real generator bugs that were
  invisible to the golden tests (they pass ast.parse and import-time
  exec but never check wire-format compatibility with the Go side).
  Fixes shipped in the preceding commit.

examples/python-errors-demo (41 assertions)
  Covers every error surface: ValidationError parsed from a buf.validate
  body, registry-based disambiguation across NotFoundError /
  ConflictError / RateLimitError, and an *Error embedded as a field on
  a regular response (the BatchCreateItemResult pattern from
  @yashagarwal-sarwa's #172 — exercises the FieldValidationError.
  from_dict alias that lives alongside populate()).

CLAUDE.md updated to list both examples in the project-structure
section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SebastienMelki
Copy link
Copy Markdown
Owner Author

@yashagarwal-sarwa update — pushed the from_dict fix you flagged, plus broader coverage. The new commits are:

  • 3521398 — your fix: from_dict alias for *Error classes
  • 608ff0a — three more generator bugs the comprehensive demo caught
  • 53a7826 — two new end-to-end examples: python-encoding-demo (51 assertions, every JSON-mapping annotation) and python-errors-demo (41 assertions, every error surface). Both make demo from their dirs and round-trip against a real Go server.

The encoding demo specifically surfaced three issues we didn't have coverage for before:

  1. TIMESTAMP_FORMAT_DATE decoded the wire string as a str even though the field type is datetime (annotation/code mismatch on the decode side)
  2. flatten emitted wire keys with JSON-camelCase names while the Go HTTP plugin uses proto snake_case — so Address.zip_code shipped as author_zipCode from Python but the server expected author_zip_code
  3. root-map unwrap silently produced empty to_dict() / from_dict() because annotations.FindUnwrapField is list-only by design (used by tsclientgen/openapiv3) and py-client called it on root unwrap paths

All fixed and round-tripping clean. Mind taking another look?

@SebastienMelki SebastienMelki merged commit 6464a92 into main May 19, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants