Add exp-test-boilerplate-detection and exp-assertion-quality skills by Evangelink · Pull Request #468 · dotnet/skills

Evangelink · 2026-03-30T09:23:20Z

Two new experimental skills for dotnet-experimental plugin:

exp-test-boilerplate-detection: Detects duplicate boilerplate patterns across .NET test suites and identifies refactoring opportunities. Categories: repeated construction, assertion patterns, copy-paste methods, duplicated setup/teardown, repeated infrastructure.
exp-assertion-quality: Analyzes assertion variety and depth across .NET test suites. Produces metrics dashboard including assertion count, type spread, trivial %, assertion-free tests, and negative assertion coverage.

Both skills include eval.yaml with 3-4 scenarios each (positive, negative, and non-activation) plus test fixtures.

Evangelink · 2026-03-30T09:23:32Z

/evaluate

Two new experimental skills for dotnet-experimental plugin: - exp-test-boilerplate-detection: Detects duplicate boilerplate patterns across .NET test suites and identifies refactoring opportunities. Categories: repeated construction, assertion patterns, copy-paste methods, duplicated setup/teardown, repeated infrastructure. - exp-assertion-quality: Analyzes assertion variety and depth across .NET test suites. Produces metrics dashboard including assertion count, type spread, trivial %, assertion-free tests, and negative assertion coverage. Both skills include eval.yaml with 3-4 scenarios each (positive, negative, and non-activation) plus test fixtures.

Copilot

Pull request overview

Adds two new experimental skills under the dotnet-experimental plugin to (1) detect duplicated boilerplate across .NET test suites and (2) evaluate assertion quality/diversity, each with accompanying eval scenarios and fixture projects.

Changes:

Added exp-test-boilerplate-detection skill documentation plus eval.yaml and MSTest fixture suites for “heavy” vs “minimal” boilerplate.
Added exp-assertion-quality skill documentation plus eval.yaml and fixture suites representing low-diversity, good-diversity, and assertion-free tests.
Introduced non-activation eval scenarios intended to ensure these skills don’t activate for “write tests from scratch” requests.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/dotnet-experimental/exp-test-boilerplate-detection/fixtures/minimal-boilerplate/Calculator.Tests/CalculatorTests.cs	Minimal-boilerplate MSTest fixture.
tests/dotnet-experimental/exp-test-boilerplate-detection/fixtures/minimal-boilerplate/Calculator.Tests/Calculator.Tests.csproj	MSTest fixture project for minimal-boilerplate scenario.
tests/dotnet-experimental/exp-test-boilerplate-detection/fixtures/heavy-boilerplate/OrderService.Tests/OrderService.Tests.csproj	MSTest fixture project for heavy-boilerplate scenario.
tests/dotnet-experimental/exp-test-boilerplate-detection/fixtures/heavy-boilerplate/OrderService.Tests/OrderProcessorTests.cs	Heavy-boilerplate MSTest fixture with repeated setup and assertions.
tests/dotnet-experimental/exp-test-boilerplate-detection/eval.yaml	Eval scenarios for boilerplate detection (heavy/minimal/non-activation).
tests/dotnet-experimental/exp-assertion-quality/fixtures/low-diversity/PaymentService.Tests/PaymentService.Tests.csproj	MSTest fixture project for low-diversity assertions.
tests/dotnet-experimental/exp-assertion-quality/fixtures/low-diversity/PaymentService.Tests/PaymentProcessorTests.cs	Low-diversity assertion fixture (mostly `AreEqual`).
tests/dotnet-experimental/exp-assertion-quality/fixtures/good-diversity/UserService.Tests/UserService.Tests.csproj	MSTest fixture project for good-diversity assertions.
tests/dotnet-experimental/exp-assertion-quality/fixtures/good-diversity/UserService.Tests/UserManagerTests.cs	Good-diversity assertion fixture with null/exception/collection assertions.
tests/dotnet-experimental/exp-assertion-quality/fixtures/assertion-free/SmokeTests/SmokeTests.csproj	MSTest fixture project for assertion-free/trivial assertions.
tests/dotnet-experimental/exp-assertion-quality/fixtures/assertion-free/SmokeTests/ApiEndpointSmokeTests.cs	Assertion-free/trivial smoke test fixture.
tests/dotnet-experimental/exp-assertion-quality/eval.yaml	Eval scenarios for assertion-quality skill (low-diversity/assertion-free/good-diversity/non-activation).
plugins/dotnet-experimental/skills/exp-test-boilerplate-detection/SKILL.md	Skill instructions and reporting format for boilerplate detection.
plugins/dotnet-experimental/skills/exp-assertion-quality/SKILL.md	Skill instructions, classification scheme, and metrics dashboard definition.

Comments suppressed due to low confidence (2)

tests/dotnet-experimental/exp-test-boilerplate-detection/eval.yaml:87

The non-activation scenario assertion regex includes the bare word test, which is so broad it can match ordinary prose and won’t reliably confirm that test code was produced. Consider tightening the assertion to require concrete code markers (e.g., \[TestMethod\], Assert\.) or a file_contains assertion if you expect code to be emitted into a file.

    assertions:
      - type: "output_matches"
        pattern: "(TestMethod|TestClass|\\[Fact\\]|test)"
    rubric:

tests/dotnet-experimental/exp-assertion-quality/eval.yaml:114

The non-activation scenario assertion regex includes the bare word test, which is likely to match non-code responses and doesn’t reliably assert that a test suite was produced. Tighten this to require concrete MSTest/xUnit/NUnit code markers (e.g., \[TestMethod\], \[TestClass\], Assert\.) or use file-based assertions if applicable.

    assertions:
      - type: "output_matches"
        pattern: "(TestMethod|TestClass|\\[Fact\\]|test)"
    rubric:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-03-30T09:32:27Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
exp-assertion-quality	Identify low assertion diversity in equality-dominated test suite	4.0/5 → 5.0/5 🟢	✅ exp-assertion-quality; tools: skill / ✅ exp-assertion-quality; tools: skill, glob	✅ 0.13	✅
exp-assertion-quality	Flag assertion-free tests and trivial-only assertions	3.3/5 → 4.0/5 🟢	✅ exp-assertion-quality; tools: skill / ✅ exp-assertion-quality; tools: skill	✅ 0.13	✅
exp-assertion-quality	Recognize well-diversified assertion usage	3.0/5 → 5.0/5 🟢	✅ exp-assertion-quality; tools: skill / ✅ exp-assertion-quality; tools: skill	✅ 0.13	✅
exp-assertion-quality	Decline request to write new tests from scratch	4.0/5 ⏰ → 2.3/5 ⏰ 🔴	ℹ️ not activated (expected) / ℹ️ not activated (expected)	✅ 0.13	❌
exp-test-boilerplate-detection	Detect repeated object construction and setup across test methods	3.0/5 → 4.7/5 🟢	✅ exp-test-boilerplate-detection; tools: skill / ✅ exp-test-boilerplate-detection; tools: skill	✅ 0.06	✅
exp-test-boilerplate-detection	Recognize tests with minimal boilerplate that need no refactoring	4.0/5 → 4.3/5 🟢	✅ exp-test-boilerplate-detection; tools: skill / ✅ exp-test-boilerplate-detection; tools: skill	✅ 0.06	✅
exp-test-boilerplate-detection	Decline request to write new tests	4.0/5 → 4.3/5 🟢	ℹ️ not activated (expected) / ℹ️ not activated (expected)	✅ 0.06	✅

⏰ timeout — run(s) hit the (120s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full results — includes quality and agent details

Evangelink · 2026-03-30T12:52:08Z

/evaluate

github-actions · 2026-03-30T13:01:05Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
exp-assertion-quality	Identify low assertion diversity in equality-dominated test suite	4.0/5 → 5.0/5 🟢	✅ exp-assertion-quality; tools: skill, glob / ✅ exp-assertion-quality; tools: skill	✅ 0.10	✅
exp-assertion-quality	Flag assertion-free tests and trivial-only assertions	3.0/5 → 4.0/5 🟢	✅ exp-assertion-quality; tools: skill / ✅ exp-assertion-quality; tools: skill	✅ 0.10	✅
exp-assertion-quality	Recognize well-diversified assertion usage	3.0/5 → 4.3/5 🟢	✅ exp-assertion-quality; tools: skill / ✅ exp-assertion-quality; tools: skill	✅ 0.10	✅
exp-assertion-quality	Decline request to write new tests from scratch	2.7/5 ⏰ → 2.7/5 ⏰	ℹ️ not activated (expected) / ℹ️ not activated (expected)	✅ 0.10	❌
exp-test-boilerplate-detection	Detect repeated object construction and setup across test methods	3.0/5 → 5.0/5 🟢	✅ exp-test-boilerplate-detection; tools: skill, glob / ✅ exp-test-boilerplate-detection; tools: skill	✅ 0.07	✅
exp-test-boilerplate-detection	Recognize tests with minimal boilerplate that need no refactoring	4.0/5 → 4.7/5 🟢	✅ exp-test-boilerplate-detection; tools: skill / ✅ exp-test-boilerplate-detection; tools: skill	✅ 0.07	✅
exp-test-boilerplate-detection	Decline request to write new tests	4.0/5 → 4.3/5 🟢	ℹ️ not activated (expected) / ℹ️ not activated (expected)	✅ 0.07	✅

⏰ timeout — run(s) hit the (120s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

📖 See InvestigatingResults.md for how to diagnose failures. Additional debugging guidance may be provided by your workflow.

🔍 Full Results - additional metrics and failure investigation steps

Evangelink requested review from dbreshears and timheuer as code owners March 30, 2026 09:23

Copilot AI review requested due to automatic review settings March 30, 2026 09:23

Copilot started reviewing on behalf of Evangelink March 30, 2026 09:23 View session

Evangelink force-pushed the dev/amauryleve/metrics branch from f0ad4b6 to 71f2cae Compare March 30, 2026 09:24

Copilot AI reviewed Mar 30, 2026

View reviewed changes

JanKrivanek approved these changes Mar 30, 2026

View reviewed changes

Evangelink enabled auto-merge (squash) March 30, 2026 12:26

JanKrivanek disabled auto-merge March 30, 2026 12:34

Evangelink closed this Mar 30, 2026

Evangelink reopened this Mar 30, 2026

Evangelink enabled auto-merge (squash) March 30, 2026 12:51

Evangelink merged commit b527b41 into main Mar 30, 2026
58 checks passed

Evangelink deleted the dev/amauryleve/metrics branch March 30, 2026 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add exp-test-boilerplate-detection and exp-assertion-quality skills#468

Add exp-test-boilerplate-detection and exp-assertion-quality skills#468
Evangelink merged 1 commit into
mainfrom
dev/amauryleve/metrics

Evangelink commented Mar 30, 2026

Uh oh!

Evangelink commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Evangelink commented Mar 30, 2026

Uh oh!

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Evangelink commented Mar 30, 2026

Uh oh!

Evangelink commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 30, 2026

Skill Validation Results

Uh oh!

Evangelink commented Mar 30, 2026

Uh oh!

Uh oh!

github-actions Bot commented Mar 30, 2026

Skill Validation Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants