Skip to content

Add exp-test-boilerplate-detection and exp-assertion-quality skills#468

Merged
Evangelink merged 1 commit into
mainfrom
dev/amauryleve/metrics
Mar 30, 2026
Merged

Add exp-test-boilerplate-detection and exp-assertion-quality skills#468
Evangelink merged 1 commit into
mainfrom
dev/amauryleve/metrics

Conversation

@Evangelink
Copy link
Copy Markdown
Member

Two new experimental skills for dotnet-experimental plugin:

  • exp-test-boilerplate-detection: Detects duplicate boilerplate patterns across .NET test suites and identifies refactoring opportunities. Categories: repeated construction, assertion patterns, copy-paste methods, duplicated setup/teardown, repeated infrastructure.

  • exp-assertion-quality: Analyzes assertion variety and depth across .NET test suites. Produces metrics dashboard including assertion count, type spread, trivial %, assertion-free tests, and negative assertion coverage.

Both skills include eval.yaml with 3-4 scenarios each (positive, negative, and non-activation) plus test fixtures.

Copilot AI review requested due to automatic review settings March 30, 2026 09:23
@Evangelink
Copy link
Copy Markdown
Member Author

/evaluate

Two new experimental skills for dotnet-experimental plugin:

- exp-test-boilerplate-detection: Detects duplicate boilerplate patterns
  across .NET test suites and identifies refactoring opportunities.
  Categories: repeated construction, assertion patterns, copy-paste
  methods, duplicated setup/teardown, repeated infrastructure.

- exp-assertion-quality: Analyzes assertion variety and depth across
  .NET test suites. Produces metrics dashboard including assertion
  count, type spread, trivial %, assertion-free tests, and negative
  assertion coverage.

Both skills include eval.yaml with 3-4 scenarios each (positive,
negative, and non-activation) plus test fixtures.
@Evangelink Evangelink force-pushed the dev/amauryleve/metrics branch from f0ad4b6 to 71f2cae Compare March 30, 2026 09:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two new experimental skills under the dotnet-experimental plugin to (1) detect duplicated boilerplate across .NET test suites and (2) evaluate assertion quality/diversity, each with accompanying eval scenarios and fixture projects.

Changes:

  • Added exp-test-boilerplate-detection skill documentation plus eval.yaml and MSTest fixture suites for “heavy” vs “minimal” boilerplate.
  • Added exp-assertion-quality skill documentation plus eval.yaml and fixture suites representing low-diversity, good-diversity, and assertion-free tests.
  • Introduced non-activation eval scenarios intended to ensure these skills don’t activate for “write tests from scratch” requests.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/dotnet-experimental/exp-test-boilerplate-detection/fixtures/minimal-boilerplate/Calculator.Tests/CalculatorTests.cs Minimal-boilerplate MSTest fixture.
tests/dotnet-experimental/exp-test-boilerplate-detection/fixtures/minimal-boilerplate/Calculator.Tests/Calculator.Tests.csproj MSTest fixture project for minimal-boilerplate scenario.
tests/dotnet-experimental/exp-test-boilerplate-detection/fixtures/heavy-boilerplate/OrderService.Tests/OrderService.Tests.csproj MSTest fixture project for heavy-boilerplate scenario.
tests/dotnet-experimental/exp-test-boilerplate-detection/fixtures/heavy-boilerplate/OrderService.Tests/OrderProcessorTests.cs Heavy-boilerplate MSTest fixture with repeated setup and assertions.
tests/dotnet-experimental/exp-test-boilerplate-detection/eval.yaml Eval scenarios for boilerplate detection (heavy/minimal/non-activation).
tests/dotnet-experimental/exp-assertion-quality/fixtures/low-diversity/PaymentService.Tests/PaymentService.Tests.csproj MSTest fixture project for low-diversity assertions.
tests/dotnet-experimental/exp-assertion-quality/fixtures/low-diversity/PaymentService.Tests/PaymentProcessorTests.cs Low-diversity assertion fixture (mostly AreEqual).
tests/dotnet-experimental/exp-assertion-quality/fixtures/good-diversity/UserService.Tests/UserService.Tests.csproj MSTest fixture project for good-diversity assertions.
tests/dotnet-experimental/exp-assertion-quality/fixtures/good-diversity/UserService.Tests/UserManagerTests.cs Good-diversity assertion fixture with null/exception/collection assertions.
tests/dotnet-experimental/exp-assertion-quality/fixtures/assertion-free/SmokeTests/SmokeTests.csproj MSTest fixture project for assertion-free/trivial assertions.
tests/dotnet-experimental/exp-assertion-quality/fixtures/assertion-free/SmokeTests/ApiEndpointSmokeTests.cs Assertion-free/trivial smoke test fixture.
tests/dotnet-experimental/exp-assertion-quality/eval.yaml Eval scenarios for assertion-quality skill (low-diversity/assertion-free/good-diversity/non-activation).
plugins/dotnet-experimental/skills/exp-test-boilerplate-detection/SKILL.md Skill instructions and reporting format for boilerplate detection.
plugins/dotnet-experimental/skills/exp-assertion-quality/SKILL.md Skill instructions, classification scheme, and metrics dashboard definition.
Comments suppressed due to low confidence (2)

tests/dotnet-experimental/exp-test-boilerplate-detection/eval.yaml:87

  • The non-activation scenario assertion regex includes the bare word test, which is so broad it can match ordinary prose and won’t reliably confirm that test code was produced. Consider tightening the assertion to require concrete code markers (e.g., \[TestMethod\], Assert\.) or a file_contains assertion if you expect code to be emitted into a file.
    assertions:
      - type: "output_matches"
        pattern: "(TestMethod|TestClass|\\[Fact\\]|test)"
    rubric:

tests/dotnet-experimental/exp-assertion-quality/eval.yaml:114

  • The non-activation scenario assertion regex includes the bare word test, which is likely to match non-code responses and doesn’t reliably assert that a test suite was produced. Tighten this to require concrete MSTest/xUnit/NUnit code markers (e.g., \[TestMethod\], \[TestClass\], Assert\.) or use file-based assertions if applicable.
    assertions:
      - type: "output_matches"
        pattern: "(TestMethod|TestClass|\\[Fact\\]|test)"
    rubric:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/dotnet-experimental/exp-test-boilerplate-detection/eval.yaml
Comment thread tests/dotnet-experimental/exp-assertion-quality/eval.yaml
Comment thread plugins/dotnet-experimental/skills/exp-assertion-quality/SKILL.md
@github-actions
Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
exp-assertion-quality Identify low assertion diversity in equality-dominated test suite 4.0/5 → 5.0/5 🟢 ✅ exp-assertion-quality; tools: skill / ✅ exp-assertion-quality; tools: skill, glob ✅ 0.13
exp-assertion-quality Flag assertion-free tests and trivial-only assertions 3.3/5 → 4.0/5 🟢 ✅ exp-assertion-quality; tools: skill / ✅ exp-assertion-quality; tools: skill ✅ 0.13
exp-assertion-quality Recognize well-diversified assertion usage 3.0/5 → 5.0/5 🟢 ✅ exp-assertion-quality; tools: skill / ✅ exp-assertion-quality; tools: skill ✅ 0.13
exp-assertion-quality Decline request to write new tests from scratch 4.0/5 ⏰ → 2.3/5 ⏰ 🔴 ℹ️ not activated (expected) / ℹ️ not activated (expected) ✅ 0.13
exp-test-boilerplate-detection Detect repeated object construction and setup across test methods 3.0/5 → 4.7/5 🟢 ✅ exp-test-boilerplate-detection; tools: skill / ✅ exp-test-boilerplate-detection; tools: skill ✅ 0.06
exp-test-boilerplate-detection Recognize tests with minimal boilerplate that need no refactoring 4.0/5 → 4.3/5 🟢 ✅ exp-test-boilerplate-detection; tools: skill / ✅ exp-test-boilerplate-detection; tools: skill ✅ 0.06
exp-test-boilerplate-detection Decline request to write new tests 4.0/5 → 4.3/5 🟢 ℹ️ not activated (expected) / ℹ️ not activated (expected) ✅ 0.06

timeout — run(s) hit the (120s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full results — includes quality and agent details

@Evangelink Evangelink enabled auto-merge (squash) March 30, 2026 12:26
@JanKrivanek JanKrivanek disabled auto-merge March 30, 2026 12:34
@Evangelink Evangelink closed this Mar 30, 2026
@Evangelink Evangelink reopened this Mar 30, 2026
@Evangelink Evangelink enabled auto-merge (squash) March 30, 2026 12:51
@Evangelink
Copy link
Copy Markdown
Member Author

/evaluate

@Evangelink Evangelink merged commit b527b41 into main Mar 30, 2026
58 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
exp-assertion-quality Identify low assertion diversity in equality-dominated test suite 4.0/5 → 5.0/5 🟢 ✅ exp-assertion-quality; tools: skill, glob / ✅ exp-assertion-quality; tools: skill ✅ 0.10
exp-assertion-quality Flag assertion-free tests and trivial-only assertions 3.0/5 → 4.0/5 🟢 ✅ exp-assertion-quality; tools: skill / ✅ exp-assertion-quality; tools: skill ✅ 0.10
exp-assertion-quality Recognize well-diversified assertion usage 3.0/5 → 4.3/5 🟢 ✅ exp-assertion-quality; tools: skill / ✅ exp-assertion-quality; tools: skill ✅ 0.10
exp-assertion-quality Decline request to write new tests from scratch 2.7/5 ⏰ → 2.7/5 ⏰ ℹ️ not activated (expected) / ℹ️ not activated (expected) ✅ 0.10
exp-test-boilerplate-detection Detect repeated object construction and setup across test methods 3.0/5 → 5.0/5 🟢 ✅ exp-test-boilerplate-detection; tools: skill, glob / ✅ exp-test-boilerplate-detection; tools: skill ✅ 0.07
exp-test-boilerplate-detection Recognize tests with minimal boilerplate that need no refactoring 4.0/5 → 4.7/5 🟢 ✅ exp-test-boilerplate-detection; tools: skill / ✅ exp-test-boilerplate-detection; tools: skill ✅ 0.07
exp-test-boilerplate-detection Decline request to write new tests 4.0/5 → 4.3/5 🟢 ℹ️ not activated (expected) / ℹ️ not activated (expected) ✅ 0.07

timeout — run(s) hit the (120s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

📖 See InvestigatingResults.md for how to diagnose failures. Additional debugging guidance may be provided by your workflow.

🔍 Full Results - additional metrics and failure investigation steps

@Evangelink Evangelink deleted the dev/amauryleve/metrics branch March 30, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants