- Agent prompts live in
agents/*.md. - The simple self-improvement kit lives in
.claude/commands/eval.md,eval/rubric.md,eval/results.tsv, andeval/role-baselines/. - The old Python eval harness has been removed. The active eval path is the simple self-improvement kit above.
- Do not push directly to
main. - For requested edits, create a short-lived branch, commit there, push the branch, open a PR, and merge the PR with
gh pr merge. - Do not merge the feature branch into local
mainbefore opening or merging the PR. That creates duplicate local merge commits and makesmainappear ahead/behind after GitHub merges the PR. - After a PR is merged, run
git fetch origin --pruneand align localmaintoorigin/mainbefore continuing work. - For docs-only or metadata-only changes, the streamlined path is: branch -> commit -> push branch ->
gh pr create->gh pr merge --merge --delete-branch-> sync localmain.
- When asked to run the healthcare self-improvement loop for an agent, first read
.claude/commands/eval.mdand execute that procedure as a normal task, substituting$ARGUMENTSwith the requested agent slug. - Treat
.claude/commands/eval.mdas the canonical workflow for both Claude Code and Codex. - If the runtime supports native subagents or model specialization, prefer a strongest scorer/judge plus a faster editor, with the parent agent owning git writes and
eval/results.tsv. - Avoid recursive CLI invocation when native subagents are available.
- Never modify
eval/rubric.mdor any file undereval/role-baselines/. - Never modify
eval/results.tsvexcept to append rows. - Preserve the agent's distinctive role identity; do not flatten prompts into generic "best practices" boilerplate.
- During a normal eval run, only edit the requested
agents/<slug>.md, appendeval/results.tsv, and write local ignored artifacts undereval/run-logs/.