21 May 21:39

ajhcs

13dab78

Latest

Healthcare Agents v1.3.0 Release Notes

Released: 2026-05-21

This release turns Healthcare Agents from a portable prompt pack into a more
usable product surface. The 51 specialist healthcare administration agents are
now easier to discover, inspect, route, install selectively, and evaluate before
use.

Highlights

Published the package to npm as healthcare-agents.
Added registry-backed discovery and provenance metadata for all 51 agents.
Added CLI commands for list, show, choose, prompt, and doctor.
Added single-agent install support with slug validation.
Improved installer dry-run and doctor output for safer file writes.
Added public eval scorecard generation.
Added trust and safety documentation covering scope, PHI limits, human
escalation, source freshness, and eval interpretation limits.
Added CI gates for lint, audit, package, CLI, and installer smoke checks.

CLI Product Surface

The CLI now supports direct discovery workflows:

npx --yes healthcare-agents list
npx --yes healthcare-agents show revenue-cycle-specialist
npx --yes healthcare-agents choose "clean claim rate dropped"
npx --yes healthcare-agents prompt quality-compliance-officer --mode audit/checklist
npx --yes healthcare-agents doctor

Users can now install one agent instead of the full pack:

npx --yes healthcare-agents install revenue-cycle-specialist --codex --dry-run

Trust And Evaluation

The new registry and scorecard make the library easier to inspect without
opening every prompt manually. Scores remain internal prompt-rubric results, not
external certification, accreditation, legal review, coding validation, billing
approval, clinical validation, or compliance approval.

The prompts still do not create a PHI-safe runtime. Use approved environments,
minimum-necessary data, local source verification, and human sign-off for final
clinical, legal, coding, billing, audit, compliance, contracting, employment, or
executive decisions.

Validation

Validation performed before release:

bash -n install.sh
bash scripts/lint-agents.sh
python3 scripts/audit-agents.py --top 10
npm pack --dry-run
node bin/cli.js --help
node bin/cli.js list
bash install.sh --all --dry-run
node -c bin/cli.js
node -c scripts/generate-scorecard.js
git diff --check
GitHub Actions CI on pull request and main

Assets 2

05 May 21:09

ajhcs

v1.2.0

e40d7b4

Healthcare Agents v1.2.0

Healthcare Agents v1.2.0 Release Notes

Released: 2026-05-05

This release makes the 51-agent healthcare administration library easier to use
without adding runtime complexity. The prompts remain plain Markdown and
generated SKILL.md packages, but the user experience is now more explicit:
choose the right agent, provide the right inputs, request the right output mode,
and see the right handoffs when work crosses departments.

Highlights

Added task-based agent selection docs for common healthcare administration jobs.
Added copy-ready starter prompts across all 10 domains.
Added a cross-agent handoff map for workflows that span departments.
Added role-tailored Best Inputs, Output Modes, and Collaboration & Handoffs sections to all 51 agent prompts.
Updated installer-managed Codex guidance so installed users get the new routing, output-mode, and handoff behavior.
Added release-only usability smoke scenarios for future checks.

Agent Usability Improvements

Every agent now tells users what information produces the strongest answer,
which output modes it supports, which adjacent agents to involve, and which human
owners must make final high-risk decisions.

The four standardized output modes are:

quick triage: likely root causes, missing data, immediate checks, escalation triggers.
workplan: owners, dependencies, KPIs, sequence, validation checkpoints.
audit/checklist: evidence requests, pass/fail criteria, remediation owners.
artifact/template: a draft deliverable with assumptions, placeholders, and review notes.

Documentation

New usage docs:

docs/usage/agent-selection-guide.md
docs/usage/starter-prompts.md
docs/usage/handoff-map.md
docs/eval/usability-release-check.md

The README now includes a compact "Choose the Right Agent" section with common
starting points and output modes.

Validation

Validation performed before release:

bash scripts/lint-agents.sh
python3 scripts/audit-agents.py --top 20
bash install.sh --all --dry-run
node bin/cli.js --help
git diff --check
Usability smoke scenarios from docs/eval/usability-release-check.md

Assets 2

23 Apr 23:58

ajhcs

v1.1.2

5e6ca94

v1.1.2 - GitHub npx Install

Healthcare Agents v1.1.2 Release Notes

Released: 2026-04-23

This patch release corrects the install docs after verifying that healthcare-agents is not yet published on the public npm registry from this environment.

Changed

README and INSTALL examples now use the working GitHub-backed command:
```
npx --yes github:ajhcs/healthcare-agents install
```
Package and installer metadata now report 1.1.2.

Validation

npx --yes github:ajhcs/healthcare-agents install --version
bash scripts/lint-agents.sh
bash install.sh --all --dry-run
npm pack --dry-run

Assets 2

23 Apr 23:55

ajhcs

v1.1.1

7d3ff9a

v1.1.1 - Installer Compatibility

Healthcare Agents v1.1.1 Release Notes

Released: 2026-04-23

This patch release focuses on installability and cross-tool compatibility after the v1.1.0 agent-stack optimization release.

Highlights

Agent frontmatter now uses lowercase hyphen name values that match filenames, with human-readable labels preserved in display_name.
The installer now supports Codex App aliases, Claude Desktop/Cowork aliases, OpenCode skills, Claude Skills, and portable .agents/skills output.
Codex installs now add a managed ~/.codex/AGENTS.md block so Codex knows how to select and read the installed healthcare specialists.
The README and installation guide now reflect the v1.1.x eval status and current cross-tool file layouts.
The self-improvement kit installer now copies all 51 role baselines.

Validation

bash scripts/lint-agents.sh
bash -n install.sh
bash install.sh --all --dry-run
Temp-home install test for Claude agents, Codex agents/instructions, Claude skills, OpenCode skills, and .agents/skills

Assets 2

23 Apr 23:36

ajhcs

v1.1.0

5aab5ff

v1.1.0 - Agent Stack Optimization

Healthcare Agents v1.1.0 Release Notes

Released: 2026-04-23

This release upgrades the healthcare-agents stack from a broad first release into a calibrated, eval-driven 51-agent library. The work focused on two things: improving every installable healthcare agent, and making the eval/improvement loop reliable enough to run under current SOTA coding and reasoning models.

Highlights

Improved all 51 healthcare administration agents.
Rebuilt the eval workflow around native subagents and model specialization.
Added role baselines for every installable agent.
Required exact, persisted Q001-Q025 question artifacts for before/after comparisons.
Removed the unused Python/DSPy harness and consolidated the project around the lightweight self-improvement workflow.

Agent Quality Improvements

All 51 prompts were evaluated and improved in two passes:

First 15 agents: average score improved from 85.0 to 93.9.
Remaining 36 agents: average score improved from 85.11 to 95.50.

The prompt changes were intentionally narrow. They sharpen role mechanics, regulatory boundaries, source hierarchies, handoffs, deliverables, and edge-case behavior without flattening the agents into generic healthcare-administration assistants.

Major improvement areas included:

Clinical operations: observation/SNF status, utilization notices, infection prevention attribution, research consent and closeout controls, EMTALA transfer handling, and emergency-preparedness activation details.
Health IT: Epic master-file dependencies, interoperability replay/backfill controls, USCDI/TEFCA readiness, telehealth payer matrices, PHI extract governance, and AI/ambient documentation controls.
Payer and value-based care: network adequacy evidence, credentialing adverse-file routing, Medicare outreach boundaries, attribution and quality-gate controls, and downside-risk readiness.
Quality and population health: CAHPS setting selection, PSWP/PSES boundaries, QI/SPC mechanics, accreditation evidence, surveillance reporting matrices, CBO MOU controls, and community-benefit documentation.
Revenue and pharmacy: 340B duplicate-discount controls, CDM edit checks, contract analytics source hierarchy, EDI denial workflows, finance reserve boundaries, coding appeal source hierarchy, and medication-safety governance.
Strategy: actuarial certification/reliance caveats, MLR workflow detail, opportunity-sizing formulas, and predictive-operations validation checks.

Eval System Changes

The active eval system is now the lightweight self-improvement kit:

.claude/commands/eval.md
eval/rubric.md
eval/role-baselines/
eval/meta/
eval/run-logs/README.md
docs/eval/exam-architect-playbook.md
docs/eval/model-tuning.md

The workflow now prefers four roles when the runtime supports it:

Parent orchestrator: owns preflight, git writes, run logs, and commit/revert decisions.
Scorer/judge: strongest available reasoning model; generates exams and critiques answers.
Editor: faster strong model; edits only the requested agent prompt.
Adjudicator: optional different model family for close deltas, high-risk roles, or release scoring.

The eval command now requires before/after or score-only baseline runs to persist full question artifacts before answers are generated. Focus labels and weak-area summaries are no longer enough. Retests must identify whether they used exact baseline questions or fresh comparable questions.

Cleanup

The old Python/DSPy harness was removed because it was not the active path for agent improvement. Deleted components included the deeper harness implementation, schema models, legacy JSON rubrics, tests, and the shell runner.

This reduces maintenance burden and makes the repo's active improvement path clearer for both Codex and Claude Code.

Validation

Validation performed before release:

bash scripts/lint-agents.sh
git diff --check
Exact-question retests using retained Q001-Q025 question artifacts for the remaining 36-agent pass.

The final tracked state contains 51 lint-clean agent prompts and the simplified eval stack.

Assets 2

09 Apr 21:16

ajhcs

v1.0.0

7a4a167

v1.0.0 — 51 Healthcare Admin Agents

Release Notes: 10-Agent Eval Loop Milestone

Date: 2026-04-09

Headline

10 of 51 healthcare administration agents now score 80+ on a rigorous 0--100 automated eval, up from zero. This is the first known automated improvement loop for healthcare admin AI agents -- iterative exam generation, rubric-locked scoring, targeted prompt edits, and git-ratcheted commits, all running without human intervention.

What Changed

We shipped a complete /eval improvement loop. Each iteration works like this:

Generate a 25-question domain exam from the agent's system prompt.
Score answers against a frozen rubric weighted Accuracy 0.40, Completeness 0.35, Specificity 0.25.
Identify the weakest areas and propose targeted prompt edits (with explicit identity-preservation constraints so prompts get sharper, not flatter).
Edit the agent prompt -- additive, high-leverage changes only, respecting a fixed line cap.
Re-score using the same frozen question set.
Commit or revert automatically. If the score improved, the edit stays and a row is appended to eval/results.tsv. If not, the file is restored. No regressions ship.

The loop uses a split-role architecture: a strong scorer/judge model generates exams and critiques, a faster editor model patches prompts, and a parent orchestrator owns git writes and the append-only log. This avoids the identity drift that comes from letting a single model optimize itself.

Agents Improved

All 10 agents crossed the 80-point threshold. Best post-edit scores:

Agent	Best Score	Key Improvements
Revenue Medical Coding Specialist	82.15	LCD/NCD medical necessity, charge capture workflows, global-period and anesthesia coding
Revenue Finance Manager	81.55	Multi-campus cost reports, capital post-implementation review, zero-based budgeting
Revenue 340B Program Manager	81.20	Orphan-drug exclusion, Medicare Part B modifier mechanics, ADR/CMP dispute workflow
Quality Compliance Officer	81.15	HIPAA breach exceptions, Stark failure modes, exclusion reinstatement controls
Healthcare Interoperability Engineer	81.10	SMART on FHIR auth/PKCE/JWT, HL7 ACK idempotency, TEFCA patient-matching governance
Quality Process Improvement Analyst	80.85	Managed-care QAPI (42 CFR 438.330), sentinel event RCA/CAPA, risk-adjusted outcomes
Revenue Cycle Specialist	80.65	835/ERA posting controls, credit balance workflow, denial-type-specific appeal assembly
Revenue Contract Analyst	80.45	Contract build hierarchy, outpatient edit logic, prompt-pay and offset economics
Payer Managed Care Analyst	80.30	Medicaid directed payments, settlement reconciliation, MA bid-to-revenue bridge
Health Informatics Manager	80.30	FHIR/SMART production ops, public-health reporting controls, identity and downtime governance

The Medical Coding Specialist saw the largest single-iteration gain at +11.00 points. The Compliance Officer improved +9.05 in one pass. Most agents required 1--3 iterations to cross 80.

What Was Added to Prompts

The eval loop does not add generic advice. It adds the specific knowledge that domain practitioners would expect and that the rubric penalizes when absent:

CFR citations: 42 CFR 412.106(b) (DSH qualification), 42 CFR Part 419 (OPPS), 42 CFR 412.4(f) (transfer DRGs), 42 CFR 438.330 (managed-care QAPI), 45 CFR 164 (HIPAA Security Rule controls)
Worked calculation examples: IME and DGME formulas with FTE rules, DSH patient-percentage calculations, OPPS APC payment formulas, transfer DRG per diem calculations
Regulatory formulas: IRC 141/148 bond compliance, payer mix shift methodology, outlier payment reconciliation mechanics
Audit process details: MAC audit lifecycle with PRRB appeal paths, reasonable collection effort criteria per CMS Pub 15-1 Section 310, HRSA ADR/CMP dispute-file requirements
Debt covenant structures: Specific threshold structures, credit-balance and overpayment workflows, prompt-pay and offset modeling
Payment mechanics: 835/ERA control points, Medicaid supplemental payment modeling, observation-status and 340B outpatient economics, Medicare Advantage bid/rebate/RAF revenue bridges

Infrastructure Shipped

Three pull requests delivered the full stack:

PR	Title	Scope
#4	Enrich all 51 agent prompts with examples and seeds	53 files -- baseline prompt enrichment across every agent
#5	Add eval calibration infrastructure and provider framework	709 files -- rubric, scoring harness, provider framework, calibration tooling
#6	Add orchestrator agent and lifecycle documentation	11 files -- orchestrator design, lifecycle docs, design specs

Merged separately: #7 landed the 10-agent improvements themselves (18 files, 759 additions).

Calibration Results

A pilot calibration run validated the scoring infrastructure before the improvement loop began:

Mean calibration delta: +0.198 (scorer alignment improved significantly after rubric tuning)
Lint pass rate: 0.48 to 0.88 (prompt structural quality nearly doubled)

The frozen rubric at eval/rubric.md locks the scoring weights so improvements are comparable across iterations and agents. Scores from different iterations of the same agent are not directly compared -- the same-question pre/post design within each iteration is the unit of measurement.

Design Decisions Worth Noting

Split-role architecture. A single model optimizing its own prompt tends to drift toward generic executive tone and lose domain edge. Separating the scorer (which identifies gaps and specifies what to preserve) from the editor (which patches the file) keeps prompts sharp.

Identity preservation. The scorer returns identity_to_preserve and anti_patterns_to_avoid alongside weak_areas. The editor is constrained to make the prompt more capable, not more average. This is why prompts gained specific CFR citations and payment formulas instead of broad best-practice boilerplate.

Line cap enforcement. Each agent file has a fixed line cap based on its baseline. Edits must fit within the cap. This forces compression and prioritization rather than unbounded growth.

Git ratchet. Every improvement is committed atomically. Every failed edit is reverted. The append-only eval/results.tsv log provides a complete audit trail. No regressions ship.

What Is Next

41 agents remaining. The loop is proven and repeatable. Target cadence is roughly 5 agents per week.
30+ agents at 80+ by mid-May. That threshold gives publishable coverage across all 10 agent categories (Revenue, Clinical, Quality, Payer, Operations, Health IT, Population Health, Pharmacy, Strategy, Emergency Preparedness).
Second-pass depth. Agents already at 80 can run additional iterations targeting 85+ as the rubric surface area becomes well-understood.
Cross-agent consistency. As more agents pass through the loop, patterns in weak areas will inform batch improvements to prompt architecture across the full set.

Assets 2

Releases: ajhcs/healthcare-agents