Skip to content

Align MSTest v2→v3 eval assertions/rubric with fixture semantics#550

Merged
Evangelink merged 4 commits into
mainfrom
copilot/fix-comments-in-review-thread
Apr 20, 2026
Merged

Align MSTest v2→v3 eval assertions/rubric with fixture semantics#550
Evangelink merged 4 commits into
mainfrom
copilot/fix-comments-in-review-thread

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 20, 2026

This PR addresses the unresolved review thread on tests/dotnet-test/migrate-mstest-v1v2-to-v3/eval.yaml by making the v2→v3 migration scenario consistent with what the fixture actually exercises. The goal is to avoid evaluating for AreNotEqual when the fixture demonstrates AreEqual/AreSame overload migration concerns.

  • Scenario assertion tightening (Goal 2: v2 NuGet → v3)

    • Added a targeted output_matches regex that explicitly checks for guidance on Assert.AreEqual/Assert.AreSame object-overload removal and generic type parameter migration language.
  • Rubric/fixture alignment

    • Updated rubric wording from Assert.AreEqual/AreNotEqual to Assert.AreEqual/AreSame, matching fixtures/v2-nuget/UserServiceTests.cs.
    • Kept scope limited to the referenced review thread feedback.
- type: "output_matches"
  pattern: "(Assert\\.(AreEqual|AreSame)(<[^>]+>)?|object overload|overload removal|remove[ds]? the object overload|generic type parameter[s]?)"

- "Warns about the Assert.AreEqual/AreSame object overload removal visible in UserServiceTests.cs"

Agent-Logs-Url: https://github.com/dotnet/skills/sessions/055e6526-4b7d-4457-9004-2c8c99969cd1

Co-authored-by: Evangelink <11340282+Evangelink@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix code based on review comments Align MSTest v2→v3 eval assertions/rubric with fixture semantics Apr 20, 2026
Copilot AI requested a review from Evangelink April 20, 2026 09:55
@Evangelink Evangelink marked this pull request as ready for review April 20, 2026 10:23
Copilot AI review requested due to automatic review settings April 20, 2026 10:24
@Evangelink
Copy link
Copy Markdown
Member

/evaluate

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the MSTest v2→v3 evaluation scenario to better align the scenario’s assertions/rubric with what the fixture (fixtures/v2-nuget/UserServiceTests.cs) actually exercises, focusing on Assert.AreEqual/Assert.AreSame object-overload removal and generic type parameter migration guidance.

Changes:

  • Added an output_matches assertion intended to gate on guidance about Assert.AreEqual/Assert.AreSame overload removal and generic type parameter migration.
  • Updated the scenario rubric wording from AreNotEqual to AreSame and made it fixture-specific.
Show a summary per file
File Description
tests/dotnet-test/migrate-mstest-v1v2-to-v3/eval.yaml Tightens Goal 2 scenario assertions and adjusts rubric wording to match the v2 NuGet fixture semantics.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/1 changed files
  • Comments generated: 1

Comment thread tests/dotnet-test/migrate-mstest-v1v2-to-v3/eval.yaml Outdated
github-actions Bot added a commit that referenced this pull request Apr 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
migrate-mstest-v1v2-to-v3 Migrate MSTest v1 project with assembly reference 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit, bash / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate MSTest v2 NuGet project to v3 3.3/5 → 3.3/5 ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04 [1]
migrate-mstest-v1v2-to-v3 Fix Assert.AreEqual object overload errors after v3 upgrade 3.3/5 → 4.3/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit / ⚠️ NOT ACTIVATED ✅ 0.04 [2]
migrate-mstest-v1v2-to-v3 Migrate from .testsettings to .runsettings 4.0/5 → 4.0/5 ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash ✅ 0.04 [3]
migrate-mstest-v1v2-to-v3 Fix DataRow type mismatch errors after v3 upgrade 4.3/5 → 3.0/5 🔴 ✅ migrate-mstest-v1v2-to-v3; tools: skill, read_bash / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04
migrate-mstest-v1v2-to-v3 Migrate to MSTest.Sdk project style 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, edit, bash ✅ 0.04 [4]
migrate-mstest-v1v2-to-v3 Handle dropped target framework during v3 migration 4.3/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ⚠️ NOT ACTIVATED ✅ 0.04 [5]
migrate-mstest-v1v2-to-v3 Migrate complex MSTest v2 project with testsettings, DataRow issues, and dropped TFM 3.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04 [6]
migrate-mstest-v1v2-to-v3 Correctly identify MSTest v1 vs v2 and recommend different migration paths 4.3/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.04

[1] ⚠️ High run-to-run variance (CV=5.73) — consider re-running with --runs 5. (Isolated) Quality unchanged but weighted score is -15.5% due to: judgment, quality
[2] ⚠️ High run-to-run variance (CV=1.66) — consider re-running with --runs 5. (Plugin) Quality improved but weighted score is -27.4% due to: judgment, quality, tokens (63909 → 153831), tool calls (6 → 10), time (72.1s → 92.1s)
[3] (Isolated) Quality unchanged but weighted score is -16.5% due to: judgment, quality, tokens (59244 → 71747)
[4] ⚠️ High run-to-run variance (CV=0.56) — consider re-running with --runs 5
[5] ⚠️ High run-to-run variance (CV=0.99) — consider re-running with --runs 5
[6] ⚠️ High run-to-run variance (CV=1.32) — consider re-running with --runs 5

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

▶ Sessions Visualisation -- interactive replay of all evaluation sessions

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 20, 2026 11:44
@github-actions
Copy link
Copy Markdown
Contributor

Skill Coverage Report

Plugin Skill Covered Coverage
dotnet-test migrate-mstest-v1v2-to-v3 13/16 81.2%
Uncovered: dotnet-test/migrate-mstest-v1v2-to-v3
  • [Validation] Project builds with zero errors (line 184)
  • [Validation] All tests pass (dotnet test) -- compare pass/fail counts to pre-migration baseline (line 185)
  • [CodePattern] Assert.AreNotEqual (line 143)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens the MSTest v2→v3 migration eval scenario so its automated assertions and rubric align with what the fixture (fixtures/v2-nuget/UserServiceTests.cs) actually demonstrates (AreEqual/AreSame object-overload migration + generic type parameter guidance), avoiding checks for unrelated assertions.

Changes:

  • Added output_matches assertions requiring the agent output to mention Assert.AreEqual/Assert.AreSame and to discuss object-overload removal / generic type parameter migration.
  • Updated the scenario rubric item to reference Assert.AreEqual/AreSame (instead of AreNotEqual) and tie it to UserServiceTests.cs.
Show a summary per file
File Description
tests/dotnet-test/migrate-mstest-v1v2-to-v3/eval.yaml Tightens Goal 2 assertions to require AreEqual/AreSame + overload-removal/generic-parameter guidance and updates rubric wording to match the fixture.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/1 changed files
  • Comments generated: 0

@Evangelink
Copy link
Copy Markdown
Member

/evaluate

github-actions Bot added a commit that referenced this pull request Apr 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
migrate-mstest-v1v2-to-v3 Migrate MSTest v1 project with assembly reference 3.0/5 → 4.7/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash, glob / ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.05
migrate-mstest-v1v2-to-v3 Migrate MSTest v2 NuGet project to v3 3.3/5 → 3.3/5 ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.05 [1]
migrate-mstest-v1v2-to-v3 Fix Assert.AreEqual object overload errors after v3 upgrade 3.0/5 → 4.7/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill / ⚠️ NOT ACTIVATED ✅ 0.05 [2]
migrate-mstest-v1v2-to-v3 Migrate from .testsettings to .runsettings 3.7/5 → 4.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill, bash ✅ 0.05 [3]
migrate-mstest-v1v2-to-v3 Fix DataRow type mismatch errors after v3 upgrade 3.7/5 → 3.0/5 🔴 ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.05 [4]
migrate-mstest-v1v2-to-v3 Migrate to MSTest.Sdk project style 3.0/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.05 [5]
migrate-mstest-v1v2-to-v3 Handle dropped target framework during v3 migration 5.0/5 → 5.0/5 ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.05 [6]
migrate-mstest-v1v2-to-v3 Migrate complex MSTest v2 project with testsettings, DataRow issues, and dropped TFM 4.3/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.05 [7]
migrate-mstest-v1v2-to-v3 Correctly identify MSTest v1 vs v2 and recommend different migration paths 4.7/5 → 5.0/5 🟢 ✅ migrate-mstest-v1v2-to-v3; tools: skill ✅ 0.05 [8]

[1] ⚠️ High run-to-run variance (CV=2.80) — consider re-running with --runs 5
[2] ⚠️ High run-to-run variance (CV=2.29) — consider re-running with --runs 5
[3] ⚠️ High run-to-run variance (CV=1.49) — consider re-running with --runs 5. (Isolated) Quality improved but weighted score is -18.0% due to: judgment, quality, tokens (58649 → 76123)
[4] ⚠️ High run-to-run variance (CV=2.98) — consider re-running with --runs 5
[5] ⚠️ High run-to-run variance (CV=2.43) — consider re-running with --runs 5
[6] ⚠️ High run-to-run variance (CV=0.67) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -4.3% due to: tokens (30697 → 49088), tool calls (2 → 3)
[7] ⚠️ High run-to-run variance (CV=1.42) — consider re-running with --runs 5
[8] ⚠️ High run-to-run variance (CV=0.92) — consider re-running with --runs 5

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

▶ Sessions Visualisation -- interactive replay of all evaluation sessions

@Evangelink Evangelink merged commit c14d90d into main Apr 20, 2026
34 checks passed
@Evangelink Evangelink deleted the copilot/fix-comments-in-review-thread branch April 20, 2026 12:59
sayedihashimi pushed a commit to sayedihashimi/skills that referenced this pull request Apr 20, 2026
…net#550)

* Initial plan

* Align MSTest v2->v3 eval rubric with fixture assertions

Agent-Logs-Url: https://github.com/dotnet/skills/sessions/055e6526-4b7d-4457-9004-2c8c99969cd1

Co-authored-by: Evangelink <11340282+Evangelink@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Evangelink <11340282+Evangelink@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants